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COMPUTER AND PHOTOGRAMMETRIC GENERAL LAND USE STUDY 
OF CENTRAL NORTH ALABAMA 


SUMMARY 

The report is divided into two sections which describe the photogramme trie and 
computer land use study respectively, but, as a single entity, the report documents the 
step by step procedures used in the land use study from its inception to its completion. 
The land use scheme presented in Geological Survey Circular 671, entitled, “A Land Use 
Classification System for Use with Remote Sensor Data” by J.R. Anderson, E.E. Hardy, 
and J.T. Roach [ 1 ) , was used as a standard for comparison of the land use maps developed 
from conventional aerial photography and Earth Resources Technology Satellite (ERTS) 
digital multispectral scanner data of the central north portion of Alabama. The results 
include land use maps obtained from photo-interpretation of the aerial photography and 
from two computer programs operating on the ERTS data, as well as area percentages 
and spectral signatures of the land use categories. Minimum computer time and cost of 
such a survey are estimated for the entire state, and recommendations are made for 
additional needed efforts and improvements. 

It is worthwhile to mention that data classification or information management 
programs are one of many tools that are available to an investigator for developing a 
solution to his problem or particular question, but the programs are not the solutions to 
the problems themselves. Before contemplating the use of such tools, it would be wise 
for the investigator to specifically define the objectives to be accomplished, take note of 
the available resources, and evaluate several alternative ways for accomplishing the desired 
objectives. 


PHOTOGRAMMETRIC LAND USE STUDY OF CENTRAL NORTH ALABAMA 


Introduction 

A land use study of the central north portion of the State of Alabama was 
initiated utilizing participants with experience in various technical fields. Two of the 
participants, Robert Jayroe and Warren Campbell, were experienced in devising 
automated land use classification schemes which employed digital computer techniques, 
and which are fed intelligence via digital data tapes obtained by orbiting space platforms. 
The third participant, Paul Larsen, had experience in the field of conventional 
photogram metric engineering. 

The efforts of these participants were combined in a local area land use analysis 
for the purpose of demonstrating the applicability of automated classification schemes 
and photo-interpretation techniques to such an analysis. ERTS digital tapes, ERTS 



imagery, high and intermediate altitude aerial photographs, and ground truth photography 
were the primary information sources available to the participants while conducting the 
analysis. 

Early considerations discussed by the participants were as follows: (1) What 
resource should be evaluated in the study, or should the study deal with general lap ' 
usage? (2) What would the geographic bounds be for the area which was to lc 
automatically classified by computer, mosaicked, and mapped? (3) What was the quality 
of ERTS data available for the selected area? (4) Was there high altitude aircraft imagery 
(RB-57), and intermediate altitude aircraft imagery (U.S. Dept, of Agriculture) available 
of the chosen area? and (5) Which of the available imagery and data would be selected 
for application to the study? Decisions resulting from these considerations are discussed 
in subsequent paragraphs. 

The primary plan of the project emerged: to develop land use maps by the use of 
(a) the Spatial and Spectral Clustering Program and ERTS digital tapes of the chosen 
area; (b) the Sequential Clustering Program and ERTS digital tapes of the chosen area; 
and (c) aerial photomosaics prepared from vertical imagery resulting from high and 
intermediate altitude aircraft missions over the chosen area. In addition, the day to day 
findings resulting from each of these approaches were to be exchanged among the several 
participants in order to “cross pollinate" and improve the efficiency and potential of the 
results of each of the three techniques used in the study. 

ERTS imagery and digital tapes, for use in the two computerized land use 
classification techniques, were selected from the November 4, 1972, ERTS overpasses. 
Accordingly, the high altitude aircraft imagery chosen for use in the preparation of the 
small scale (1:144,300) mosaic of tire land use study area, was that obtained by an RC-8 
camera mounted in an RB-57 aircraft flown over the area on November 16, 1 97 i . In this 
mission, Eastman Kodak Aerochrome Infrared 2443 Film was used. 

In addition to the color infrared RB-57 imagery used in the small scale 
photomosaic preparation of the entire land use study area, black and white aerial 
photographs made available by the Department of Agriculture w< te used in the 
preparation of several larger scale (1:20,000) mosaics of selected portions of the land use 
study area (please refer to the Appendix). The primary purposes of the mosaics were: (1) 
to serve as a basis for land use map preparation using conventional photo-interpretive 
techniques, and (2) to supplement the photographic ground truth of the land use study 
area, both synoptically at the smaller scale and in considerable detail at the larger scale. 
The mosaics and the photographic ground truth were of significant benefit in preparing 
the land use maps resulting from the conventional photo-interpretive techniques, and 
were equally beneficial when interpreting the map printouts produced by the automated 
computer schemes from digital tapes. 

Early in the project, when the geographical area for this study was being discussed, 
three small adjacent rectangular are .s, approximately 48 statute miles by 33 statute miles 
(77.2 km by 53 km), across central north Alabama, were considered. Pros and cons of the 
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land use related parameters of each area were “weighed” and compared so .hat the best 
area for the study's purposes would be selected. A 55 statute mile by 33 statute mile 
(88.5 km by 53 km) rectangular area centered around Athens, Alabama, (Fig. 1) was 
eventually selected as the study area. Its western boundary was Wheeler Dam; its eastern 
boundary was Huntsville Mountain just east of Huntsville, Alabama; the northern 
boundary was the Tennessee-Alabama state line; and the southern boundrry was a line 
approximately 30 miles (48.2 km) south of, and parallel to, the Te .nessee-Alabama state 
line. 


At the beginning of this project, some consideration was given to slanting the 
study towards one specific type of land use such as agriculture, forestry, urban 
expansion, or transportation networks. After discussion and thought by the participants 
as to choosing one specific land use for the study, it was mutually agreed that a general 
land use study covering a variety of specific land uses would more satisfactorily evaluate 
the digital and photogrammetric techniques for land use applications. Similar work using 
photogrammetric techniques is reported in References 2 and 3. In addition, a 
decision was made to adopt the land use classification system specified in the Geological 
Survey Circular 671, dated 1972, as a standard for all three of the techniques being 
studied. Circular 671 is discussed briefly in the next Section. 

Some loss of photographic detail and degradation of overall image quality may be 
noticeable in the photographic illustrations. However, readers desirous of studying the 
original plates, from which the photographs were made for the report, may do so at the 
author’s installation. 


Land Use Maps Resulting From Aerial Photomosaics and 
Single Aerial Photographs 

PREPARATION OF AN AERIAL PHOTOMOSAIC OF THE LAND USE STUDY AREA 

Purpose; An r.erial photomosaic of the Land Use Study Area, shown in Figure 1, 
was produced (1) to provide a base for conventional land use map preparation, and (2) to 
provide a useful information source which would be helpful in interpreting the computer 
classification maps produced by the automatic data classification schemes. 

Method of Preparation: Before an aerial photomosaic can be produced, it is 
necessary to determine what photographic imagery of the area is available, and then 
choose which of that available imagery will be used to best suit the purpose of the 
mosaic. To mosaic an area the size of our Land Use Study Area, which is approximately 
2,700 square miles or 6,900 square kilometers, it is necessary to choose imagery of a 
scale such that an excessively large number of photographs are not required. 

One good source of current imagery which is available for most of the United States 
is the Soil Conversation Service of the Department of Agriculture. This agency sells 
vertical photographs which are taken every few years at a scale of 1 to 20,000. These 
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Figure 1 . USGS map of land use study are 



were considered, but only tor detailed examinations of ceitain parts of the Land Use 
Study Area, and not for the overall synoptic view, due to the obvious limitation of space 
required to lay out such a large mosaic. 

Another good source of available imagery was NASA RB-57 imagery obtained as 
part of NASA’s Earth Observations Aircraft Program. Three sets of RB-57 imagery, each 
of which included the area of the Land Use Study, were located in the Environmental 
Applications Office of Marshall Space Flight Center. The overflights are specified as 
follows: Mission 166, Flight 8. flown on May 21, 1971; Mission 191, Flight l, flown on 
November 8, 1971; and Mission 191, Flight 4, flown on November 16, 1971. The 
appropriate frames for these three flights, both the RC-8 camera imagery and the Zeiss 
camera imagery were reviewed tor applicability to the Land Use Study Area. Of the 
above three flights. Flight 1 of Mission 191, flown on November 8. 1971. was eliminated 
from contention because of c'oud conditions over the eastern portion of the Land Use 
Study Area. The remaining two flights, the May 21 and November 16 imagery of 1971, 
were both virtually cloud free and could be used for photomosaic purposes. Of these two 
flights the November 16, 1971, Flight 4 imagery of the RC-8 camera was cliosen, because 
November 1972 ERTS imagery was selected earlier for use in the automatic data 
classification schemes to be employed in this cooperative study . 

After choosing the November 16, 1971, RB-57 imagery for the mosaic, the next 
step was to decide which frames were needed from this mission, and then have prints 
made. The prints were produced in the Photographic Division of Marshall Space Flight 
Center, by first making 4- by 5-in. exposures of the 9- bv 9-in. RC-8 positive 
transparencies and then enlarging the 4- by 5-in. format up to 8- by 10-in. prints. The 
component photographs of the mosaic were then prepared, trimmed, feather-edged, and 
laid using azimuth line control in accordance with conventional photogrammetric 
techniques. Due to the reduction in format size of the component prints from 9 in. by 9 
in. down to 8 in. by 8 in., the resultant scale was 1:144.300 compared to 1:120,000 on 
the original positive transparencies. The mosaic shown in Figure 2 has been 
photographically reduced to an approximate scale of 1:586.000. The film in the RC-8 
camera in the RB-57 at the time the exposures were made was Kodak Aerochrome 
Infrared-2443 with an Estar base. While viewing the photomosaic shown in Figure 2, the 
reader may find it helpful to be aware of the following basic characteristics of Kodak’s 
Aerochrome Infrared-2443 Film. The three layers of the 2443 false color film are 
sensitive to green, red and infrared radiation instead of the usual blue, green, and red 
used for normal rendition of the visible spectrum. In the final tranparencies or prints 
produced from this film, green results from red exposure and red results from infrared 
and green exposure. 

In forest survey work, diseased foliage can be identified and distinguished from 
healthy foliage by interpretation of infrared reflectance of the foliage as recorded on the 
film. With this film you can also distinguish between deciduous and evergreen trees. 
Healthy deciduous trees have a much higher infrared reflectivity than do healthy 
evergreens. In the spring and the summer healthy deciduous trees show up as magenta or 
red, healthy evergreens show up as bluish-purple, and dead or dying deciduous leaves or 
evergreen needles photograph bright green since they have lost their infrared reflectivity. 
In the fall, red leaves photograph yellow, and yellow leaves photograph white. 
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In the photomosaic shown in Figure 2, the reader, even after brief study, will be 
able to identify the Tennessee River and Wheeler Lake, running north-westerly in the 
bottom half of the mosaic. The large city and urban area of Huntsville is seen in the 
eastern end of the mosaic, and to the southwest of Huntsville is Decatur, on the south 
side of the Tennessee River. Due north of Decatur is the town of Athens. Other small 
towns such as Courtland, Rogersville, Gurley, and Lexington are identifiable. Forested 
areas are recognizable on many parts of the mosaic. A little more careful study will 
enable the individual to locate highways and railroad beds. Interstate 65, running 
north-south near the center of the mosaic, was not completed across the Tennessee River 
at the time this imagery was taken. Huntsville-Decatur Jetport is visible to the south of 
Highway 72 connecting Huntsville and Decatur, about midway between these two cities. 
Near the town of Courtland, which is approximately 10 miles south-southwest of the 
junction of Elk River and the Tennessee River in the southwest portion of the mosaic, 
can be seen the old Courtland Army Air Base. 

In general, grazing areas and rich pasture land in this mosaic show up as shades of 
pink or red due to the high infrared reflectivity of this type of land. Carefil 
discrimination of tonal qualities allows one to differentiate between the deciduous forests 
and coniferous forests. To help the reader in locating points of interest in the mosaic, a 
portion of the 1 : 250,000 USGS map was included as Figure 1 at a reduced scale. 

MAP PRODUCT BASED ON USGS CIRCULAR 671 

The United States Geological Survey, in its circular 671 published in 1972, 
described a Land Use Gassification System [ 1 ] for use with remote sensor 
data and proposed it for review am’ testing. At the outset of our land use study, we 
decided to adopt the scheme as proposed in circular 671, in view of the large number of 
land use classification schemes available, all with various advantages and disadvantages. 
Perhaps one good way to describe this Land Use Classification System is to quote the 
abstract of the circular: “The framework of a national land use classification system is 
proposed for testing and review. The classification has been developed to meet the needs 
of Federal and State agencies for an up-to-date overview of land use throughout the 
country on a basis that is uniform in date, scale, and categorization at the more 
generalized first and second levels and that will be receptive to data from instrumented 
satellite and high-altitude aircraft platforms. The classification system utilizes the best 
features of existing widely used classification systems to the extent that they are 
amenable to use with remote sensing, and it is open-ended so that regional, state, and 
local agencies may develop more detailed land use classification systems, at third and 
fourth levels, to meet their particular needs and at the same time remain compatible with 
each other, and with the n. mal system.” 

A synoptic land use map of the study area, which was prepared from the aerial 
photomosaic, is shown in Figure 3. The following categories, taken from Level I of 
Circular 671’s scheme, together with their respective color codes, were used: 1. Urban 
and built-up land - yellow; 2. Forest land - green; 3. Water - blue; and 4. Agricultural 




and Rangeland - white. It is pointed out that in Circular 671, agricultural and range- 
land are considered as two separate categories, but for the purposes of this land use map 
the two are lumped together in view of the scale of the photomosaic from which the 
map was prepared. The fields and agricultural areas are so small on a map of this scale 
that it would have been unrealistic to divide them out into the separate categories of 
agricultural land and rangeland. Also, in this part of the country, grazing fields are very 
often interspersed among the agricultural or crop producing fields. In addition to the 
various land use categories, listed above, which have been shown on this land use map, 
highways were added by using red strips and railroads were shown by using black strips. 
The other items of information which were shown on the map, but which were not 
photointerpreted from the mosaic or its component photographs, were county boundaries 
shown in southern Tennessee and northern Alabama with small black dashes, and the 
state boundary between Alabama and Tennessee, shown with large black dashes. 

This map product, although quite simple and inexpensive, does enable one to 
quickly assess the extent of the area’s primary land uses. It is evident from the map that 
a little less than three quarters of the area is being used for various agricultural 
applications, including grazing. A little less than one quarter is forest land, and smaller 
amounts are occupied by urban and built-up applications such as residential, commercial, 
industrial, transportation, and institutional. A more precise accounting of the land use 
components in the area has not been included here since certain land usage category 
simplifications were made during the map’s preparation. 

At the time the author prepared this map, based upon the mosaic shown in 
Figure 2, an accurate recording of time spent on its production was not made. However, 
during the final phase of our land use study, the co-authors felt that comparative time 
and material figures for the three mapping techniques would be of interest to the readers. 
According! , it was estimated that approximately 35 engineering/photo-interpretation 
man hours and $5.00 in material costs were expended in the production of the small scale 
land use map of Central North Alabama illustrated in Figure 3. 

LARGE SCALE LAND USE MAP OF THE JONES VALLEY AREA 

Figure 4 is a map showing the various land use categories existing in an 
8.25-square-nik (21.4-square-kilometer) area in southeast Huntsville, Alabama, known as 
Jones Valley. Phis map was prepared from a United States Department of Agriculture 
aerial pf ograph taken at a scale of 1:20,000 on December 7, 1970, but it is shown in 
Figur 5 at a reduced scale. The USDA serial number of this photograph is HM-5MM-63. 
C"' December 7, 1970, sequential coverage of most of the city of Huntsville was obtained 
i addition to coverage of other parts of Madison County. 

One reason for selecting a photograph of this scale for use in our land use study 
v/as to demonstrate the ease of preparing a land use map of an area, using very basic and 
fundamental engineering tools and principles. In preparing this map, sequential 
photographs taken before HM-5MM-63 and photographs taken subsequent to it in the 
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Figure 4. Land use Jones Valley. Huntsville. Alabama. 
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same flight line, were utilized. Stereographic pairs of adjacent photographs were 
combined in such a fashion to enable one to view the imagery three-dimensionally with a 
simple lens stereoscope. Another reason for choosing this particular frame for study use 
was because the large range area shown surrounding the Jones Farm, just to the right of 
center of the photograph, is also quite visible on ERTS-1 imagery (Band 7) of this area. 

Most areas of the photograph were easy to assign to appropriate land use 
categories with the help of a light table and some magnification. In several instances a 
stereoscopic examination was required to discriminate between agricultural land and range- 
land, between commercial zones and institutional areas, and to locate power lines. But, 
with reasonable effort, these choices could be made without too much difficulty. 

One important factor that must be considered when making a land use study of 
any type, is that current USDA Agricultural Stabilization and Conservation Service 
(ASCS) imagery is available for most parts of the country. It is a high contrast imagery 
of good cartographic quality and excellent resolution. It is readily available and quite 
inexpensive. It can be purchased by anyone from the Eastern Aerial Photography 
Laboratory in Asheville, North Carolina. 

The map shown in Figure 4 was prepared using the Land Use Classification System 
specified in Geological Survey Circular 671 as its basis. The legend on the map shows the 
color codes selected for this map for the various types of urban and built-up land, 
agricultural land, rangeland, forest land, and water. Thus five of the nine “Level I” land 
use classifications and four “Level II” classifications as specified in Circular 671, were 
utilized in the preparation of this land use map. The four Level I land use categories 
which really are not pertinent to the small study area shown, and which therefore were 
not considered for mapping were nonforested wetland, barren land, tundra, and 
permanent snow and ice fields. 

While preparing this map, undeveloped lands currently not in use, which were 
adjacent to or within existing commercial areas, were considered to be “undeveloped 
commercial.” These areas were color coded the same color as commercial, but outlined in 
black. In like manner, residential areas which were not completely developed at the time 
this imagery was obtained, were color coded residential and outlined in black since they 
were to be subsequently developed. Main roads were shown with black dashes and TVA 
transmission lines with red dashes. 

This type of “small area and large scale" map is quite helpful to those engaged in 
activities such as urban expansion studies, farming, cattle grazing enterprises, city 
government planning, forest resource utilization, real estate marketing, and parks and 
recreational area planning. 

A cursory review of the mapped area shows that approximately 40 percent of the 
area is residential, 25 percent range, 20 percent forest, and the balance of the area is being 
used for agricultural, institutional, and other purposes. 
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As was the case of the small scale map of the land use study area shown in Figure 
3, a record of time spent in preparing the map (Fig. 4) from the photograph (Fig. 5) was 
not kept. However, a realistic estimate of time and materials required would be 10 
engineering/photo-interpretation man hours and less than $1.00 in material costs. 

As a contrast to the smaller scale land use map of Central North Alabama shown 
in Figure 3, one can see that the resolution or degree of detail in a large scale map such 
as that of Jones Valley at a scale of 1:20,240 (scale somewhat reduced in this report) 
permits the map to reveal a greater amount of information per unit area than does a 
smaller scale land use map such as that in Figure 3. Both map types have their purposes 
and both have their limitations. An important consideration relative to the preparation of 
large scale land use maps of this type is that one can easily prepare a map of his own 
geographic area of interest, using readily available current imagery and basic engineering 
tools, which will serve a variety of basic and everday needs, such as those examples 
mentioned above. 

A reproduction of the aerial photograph from which the large scale map was 
prepared was shown in Figure 5. A cursory examination reveals the large rangeland area 
on the right, and route US-231 (Memorial Parkway) which runs north-northwest on the 
west edge. Residential areas appear to the north, west, and south of the range area. Some 
scattered agricultural areas remain, but arc gradually being taken over by residential, 
commercial, and institutional land users. Forested areas remain, but in some cases these 
also are falling victim to residential area expansion. 


General Comments and Recommendations 

While performing the analyses reported herein, generous use was made of the 
1:20,000 scale USDA panchromatic imagery for large scale land use map preparation. 
This was done because the scale of USDA imagery is approximately equal to that of the 
computer printout resulting from processing of the ERTS multispectral scanner tapes, 
thereby facilitating direct comparisons of information content. In addition, ERTS prints 
at a scale of 1 : 1 ,000,000 did not provide the resolution nor degree of detail desired in 
the selected land use study area. 

For the synoptic or small scale land use map preparation, RB-57 obtained infrared 
imagery was used at a scale of =» 1 : 144,300. 

Conclusions made, based on this section of the reported work, are listed as 
follows: 


1. The 1 : 144,300 Infrared Aerial Photomosaic contains a greater amount of land 
use category detail and information than can be efficiently and economically mapped at 
that scale manually. 


2. It does provide, however, a very good Level I mapping base, provided range- 
land and agricultural land can be assumed to be lumped into one category for the 
mapping purpose under consideration. 

3. If the original RB-S7 infrared imagery positive transparencies were magnified 
from 1 : 1 20,000 scale up to the scale of the sequential classification scheme and spatial 
and spectral classification scheme printouts (approximately 1:20,000), then the visual 
land utilization information content of the two would be reasonably similar. 

4. The level of detail reported on a 1:20,000 scale land use map, prepared 
manually from USDA black and white aerial photographs using basic stereoscopic 
interpretation techniques, compares favorably with the information content provided by a 
similar scale printout version of either the spatial and spectral or the sequential 
automated classification schemes. 

5. The 1:144,300 scale land use map of central north Alabama, prepared from 
the infrared aerial photomosaic of the same scale, does not provide the degree of land use 
component detail that the 1:20,000 scale automated classification scheme prepared map 
does. However, the variety of applications for land use maps insures the continual use of 
each type. 

6. A large scale map of a small area, devoted to uses such as commercial and 
neighborhood urban expansion or perhaps truck farms raising small acreage cash crops, 
could certainly be enhanced by the application of the automated classification schemes, 
with their inherent ability to define many spectral signatures, if smaller areas (i.e., 50 ft 
by 50 ft) could be classified. On the other hand, the cheaper, less detailed, manually 
prepared smaller scale version land use map would serve well in applications such as 
broad forest area harvest planning, range and farmland expansion studies and hydrological 
energy resource planning and utilization studies. 

7. Land use maps prepared by using conventional photogrammetric engineering 
techniques and current aerial photography can be compiled directly and quite 
inexpensively from the imagery in one basic operation; however, updated aerial 
photography of a scale suitable for land use map preparation becomes available usually 
only once every 4 to 5 years. Conversely, land use maps prepared by automated 
computerized land use classification techniques, as discussed in this report, must be 
prepared using several computer steps or iterations; however, updated digital tapes 
providing ground scene/land use intelligence become available during the sensor’s orbital 
lifetime for application to land use mapping projects much more frequently than the 
aforementioned updated aerial photography, thereby facilitating the desirable temporal 
updating feature more frequently. 
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COMPUTER LAND USE STUDY OF CENTRAL NORTH ALABAMA 


Introduction 

Two computer programs developed by members and under sponsorship of the 
Aerospace Environment Division, Marshall Space Flight Center, are evaluated toward the 
application of land use mapping. These two programs are called the Composite Sequential 
Clustering Program (CSCP), [4] and the Spatial and Spectral Gustering Program 
(SSCP) 15]. Although the two programs are designed to have similar end products, 
two different approaches are used. The ultimate objectives of both programs were to 
minimize the requirements of human judgement and intervention in the data analysis 
loop and to perform the analysis as automatically as possible, because of the potential 
need for analyzing large volumes of earth observation data. Thus, an attempt was made 
to perform the analysis with little a priori knowledge of ground truth, and to limit 
subjective judgement concerning the analysis to the interpretation of the results. 

Three distinct advantages can be realized from performing the analysis without 
prior knowledge of ground truth, and especially in cases where the analysis area is quite 
large. First, the analysis is performed with a minimum amount of prejudgement. The 
second advantage is that ground truth patrols can be directed to collect ground truth in a 
manner which minimizes cost, manpower, and time. Thirdly, the maps resulting from the 
analysis can also be used as an information base where it is desirable to use considerable 
human judgement in the analysis or, if required, to improve the accuracy of results. 

Although the programs were wiitten with the above mentioned intentions in 
mind, plenty of restart capabilities and places for supervision of the results were also 
included. There were several reasons for doing this. More often than not, it is better to 
be safe than sorry, especially where large amounts of data and computer time are 
involved. Second, the programs are more flexible; and third, the programs are easily 
adaptable to interactive processing, where that capability is available. As more experience 
and confidence are obtained from using these programs, the amount of required super- 
vision can be decreased, but at present the amount of supervision desired is left up 
to the individual user. 

The computer programs were evaluated based upon the required resources and the 
results obtained. Specific resources that were considered were cost, time, equipment, and 
human involvement, but the results had to be subjectively evaluated on accuracy and 
usefulness of information. Although the computer analysis was performed on ERTS 
digital data, the accuracy comparative analysis was performed with aerial photography at 
scales of 1:20,000 and 1:144,300. The aerial photography that was available had been 
acquired during the same month but two years prior to the ERTS data collection. The 
photography did have the advantage that the needed detail was provided, although some 
ground scene changes were noted, at a scale more comparable to the computer output. 
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The main objectives of the computer analysis were to produce a map of at least 
the Level I land use categories discussed in the photogrammetric section and to 
determine how well the photogrammetrically derived land use classification scheme agreed 
with the computer classification scheme. Lower level category information was retained 
on the computer maps when identification was possible, but no attempt was made to 
distinguish crop categories because of the ground truth collection requirements. 


General Equipment and Data Requirements 

The required input data for both programs consists of digital data contained on a 
magnetic tape with a maximum of 12 allowable channels of data. It is not mandatory to 
analyze all 12 channels simultaneously, since an option is provided for selecting any 
combination and number of channels equal to or less than 12. The input data can be 
acquired from any number of sensors, such as from multispectral photography and/or 
scanners, as long as the channels of the data set are matched point for point on the 
ground scene. Then, each point on the ground scene is represented by a spectral vector 
whose components are the amplitude of the data in the corresponding channels. 

Both programs are written in Fortran IV and arc currently running on an IBM 
7094/44 in a 32K core size environment. Originally, the programs were written in 
modular form, but the modules have been interfaced because the core storage was 
available. As a result, the programs require anywhere from 4 to 12 tape drives depending 
upon which analysis options are exercised. When core storage and tape drives become a 
constraining factor, the programs can be separated into their original modules and run in 
sequential order with the output of the previously run module being used as an input to 
the next module. 

At present, the output to both computer programs is limited to standard 
computer printout, microfilm, and Xerox copy obtained from the microfilm. Typically, 
all statistical information is limited to the printout, while the maps that are generated are 
output in all three forms. The maps can then be examined at a variety of scales for 
imagery comparison. The reader is invited to consult References 6, 7, 8, and 9 for more 
specific and detailed information. 

A set of ERTS (Earth Resources Technology Satellite) imagery and associated 
bulk MSS (Multispectral Scanner) digital data was obtained for the analysis. The image, 
ERTS-E-1 104-15552, acquired on November 4, 1972, covered an area of approximately 
13,000 square kilometers over North Central Alabama. The digital data associated with 
the image is contained on 4 magnetic tapes with each tape containing 4 spectral channels 
of data, 2,340 scans/channel, and 810 data samples/scan/channel. The 4 channels of data 
cover the spectral ranges 0.5-0.6, 0.6-0. 7, 0.7-0.8, and 0.8- 1.1 micrometers. 



Spatial and Spectral Clustering Program (SSCP) 

THOUGHT PROCESS USED IN DEVEI OPING THE PROGRAM 

The program was originally developed with aircraft scanner data in mind, which 
meant that the data would probably be acquired in one long continuous strip. The 
objective was to develop a program which could analyze the data, with minimum 
supervision and without a priori knowledge of ground truth, and output spectral 
signatures and a map showing the location of all the different unidentified spectral 
features. The identification of the various features would then be accomplished by 
collecting the appropriate ground truth. 

As a model for the development process, an attempt was made t 'fermine how 
a human abserver would analyze the data, given an unlimited amo ' time and 
resources, i'nd then try to write this procedure down in terms of a ««.>■>•.• *r language 
and mathematics. Since the end product of the analysis was to be a n ...owing the 
geographic location of spectrally different features, this suggested looking for changes in 
the data where the ground scene changed from one feature to another. Ideally, one 
would like to be able to draw a map of the data set showing regions occupied by a 
particular feature, similar to what a photo-interpreter does when he produces a land use 
map from a photographic image. These two ideas suggested the use of a contour or grey 
level program or a boundary program. The contour or grey level programs were ruled out 
because they only operate on one channel of data at a time, and when many data 
channels are used, it is expedient and desirable to produce one map from all data 
channels simultaneously. Two boundary mapping programs were developed and they are 
discussed in detail in the Boundary Map section. One of the difficulties encountered in 
producing a boundary map was the problem of not being able to completely surround a 
particular feature with a closed boundary, and this produced problems for the next stage 
in the analysis procedure. This problem is due to the varying degrees of changes that are 
present within the data, and sometimes these changes, within an apparently homogeneous 
area, are as large as changes that exist between different spectral features that are 
geographically adjacent. Thus, a tradeoff usually exists between choosing a boundary map 
which completely surrounds every feature, but on the other hand does not produce so 
many boundaries as to render the map useless. 

Assuming that a reasonable boundary map of the ground scene can be produced, 
the next step is to determine a way of fetching the data from each homogeneous area 
without mixing the data with a spectrally different feature. Since there are boundaries 
that do not completely surround a feature, the possibility of fetching data simultaneously 
from two spectrally different features exists. The method adopted for fetching the data 
from the homogeneous areas is explained in detail in the section covering spatial 
clustering. The procedures described thus far have concentrated on the geographical 
aspects of the data rather than the spectral aspects, and as a result, there has been a 
considerable reduction in the amount of data to be handled and examined. The boundary 
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map has reduced the original number of data channels down to one, and the spatial 
clustering is performed only on the one channel boundary map and only in homogeneous 
areas of a minimum size. The output of this section of the program is a boundary map 
that contains the location of the spatial clusters, or the cluster map. 

The next step is to examine the spectral information associated with each spatial 
cluster to determine how many different spectral features are obtained a d which 
features are spectrally similar. This is accomplished by inputting the cluster map and the 
raw data into the spectral merging routine. Again, not all of the raw data . examined, 

but only the data contained within the geographic location of the spatial clusters. Data 

from different geographic areas are combined or merged, if the are;s represent spectrally 
sirr.'lar features, until finally a finite number of spectrally different features or classes 
remain. The decision logic for combining or merging data from differing geographic areas 
is also discussed in the section covering spectral merging. 

In order to produce a computer classification map, it is now necessary to check 

each data point to determine whether or not it belongs to one of the icmaining spectrally 

distinct features. The decision logic is discussed in the section covering classification. 

For additional background information and computer program listings the reader 
is invited to consult References 5, 10, and 11. The Analysis Procedure Section contains 
information that is somewhat repetitive of References 5, 10, and 11, but it is repeated to 
provide continuity and visibility to the data analysis procedure. The next section 
however, was written with added detail and does contain new changes that were made in 
the programming logic. 

ANALYSIS PROCEDURE 

Boundary Maps 

Because core storage is limited, SSCP was written to handle a maximum of 2SS 
samples/scan/channel foi a total of 12 channels of data. As a result of this constraint, an 
ERTS tape is analyzed in 4 separate strips, where 3 of the strips contain 202 
samples/scan/channel and the 4th strip contains 204 samples/scan/channel. 

The basic objective of the boundary program is to categorize the digital data 
representing the ground scene into homogeneous and nonhomogeneous areas. This is 
accomplished by comparing the spectral vector of a particular point in the ground scene 
with the spectral vector of the adjacent point in the same data scan and with the spectral 
vector of the adjacent point in the same data column. The program has to be able to 
detect changes in the north-south direction of the data set as well a: in the east-west 
direction. This not only indicates spatial continuity within a scan line but also across scan 
lines. If there is a significant difference between the spectral vector of the particular 
ground scene point in question and the spectral vectors of the adjacent ground scene 
points, then there must also be a significant spectral difference in what is being observed 
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in the ground scene. Thus, a large amount of change in adjacent data points is indicative 
of when the ground scene is changing from one particular feature to a spectrally different 
feature. A measure of this change that is used is a calculation proportional to the 
euclidian vector distance, where the dimension of the vector space is equal to N, the 
number of data channels. In mathematical terms, let be the amplitude of the data in 

channel k at scan i and column j. Then, the change in the data in the scan direction is 
given by 

Si = 


and the change in the column direction is given by 
1 N 

N Z <k x ij ~ k x i-l/ 

k=l 

The factor N* 1 is used to minimize the effect of Sj and Sj increasing as the number of 

channels increase when different data sets are used. This is an important consideration 
since unsealed distrifc itions of S; and S: may vary considerably with the number of 

c^nnels and data set, and only a fixed amount of core storage can be allocated for the 
distributions. 

The boundary program used in the analysis, called the Sequential Boundary 
Program, considers two consecutive scans of data at a time, scan i and scan i-1. For these 
two scans, distributions of Sj and Sj are computed using integer class intervals, and both 

distributions are divided into seven categories. These categories are determined by finding 
six increasing values of Sj (Sjj, S|2,...,S|g) and six increasing values of Sj (Sjj, Sj2, -,Sjg) 
such that 59 percent of the data points in scan i have values of Sj and Sj equal to or less 
than Sjj and Sjj, 64 percent have values less than or equal to Sj2 and Sj2, 69 percent 
have values less than or equal to Sjj and Sj3, 74 percent have values equal to or less than 
S,4 and Sj^, 79 percent have values less than or equal to Sj5 and Sj5> and 89 percent 
have values equal to or less than Sjg and Sjg. In order to determine whether or not a 
given data point is a boundary point, the values of Sj and Sj are examined for each 
individual data point in scan i. If Sj is greater than Sj, then the Sj distribution is 
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consulted for that particular data point, otherwise the distribution for Sj is consulted. 

Based upon which distribution was consulted, the data point is labeled according to 
which category the larger of the two values, Sj or Sj belongs. The possible labels for each 

data point are the integers 0, 1,2, .... 6, where 0 represents the 59 percent level or less, 
6 the 64 percent level or less, 5 the 69 percent level or less, 4 the 74 percent level or less, 3 
the 79 percent level or less, 2 the 89 percent level or less, and 1 represents greater than 
the 89 percent level. A map of the ground scene is then produced on digital tape that 
only contains the integers 0 through 6, and the program user now has the option of 
choosing which categories will produce boundary points and which will not. The 
parameter in the program which determined the boundary map is the variable NLEVEL. 
Based upon previous experience with several different data sets, NLEVEL is typically set 
to 4, although it has an allowable range of from 1 to 6. If NLEVEL is set to 4, then 
non-zero values on the map equal to or less than 4 are defined to be boundary points. 
Another tape for printing or for microfilm is produced containing a map showing the 
location of boundary points only . 

There are several assumptions in the above programming logic that are slightly 
hidden and worthwhile to explore. The first assumption is that the number of boundary 
points per scan is almost a fixed percentage. By setting NLEVEL equal to 4, for example, 
means that approximately 34 percent of the data per scan will end up being boundary 
points. There could be a lesser percentage of boundary points if many cases were 
encountered where, say, Sj is greater than Sj and Sj is not indicative of a boundary point 

but Sj is, or vice-versa. There also could be a larger percentage of boundary points, which 
is usually the case, because the Sj's that are indicative of boundary points do not always 
occur simultaneously with Sj’s that are indicative of boundary points, and vice-versa. The 

third possibility is that the population contained in the different levels probably will not 
occur in exactly 5- or 10-percent increments, i.e., one level may have approximately 5 
percent while the next level may only contain 4 percent. 

From an aesthetic point of view, the idea of a fixed number of boundary points 
per scan may not be pleasing, but the logic has its advantages. One advantage is that the 
program runs fairly fast and only requires one pass through the raw data. The second 
point is that the boundaries are determined from conditions contained in a small local 
area rather than from the overall conditions of the entire data set. One disadvantage of 
using local information for determining boundaries occurs when scattered clouds are 
present in the data. The clouds tend to absorb all of the boundary points, because of 
their inhomogeneity, and no patterns can be delineated in the ground scene. 

If clouds do present a problem in the data, an alternate boundary program, called 
the Joint Boundary Program, can be used. As the name implies, the program computes 
the joint distribition of Sj and Sj, but for the entire data set, and requires one pass 

through the raw data tape and another pass through an intermediate tape. The joint 
distribution is calculated in integer class intervals using equations (1) and (2), and is 


20 


r 



limited to a SO by SO array. From past experience the array was found to be large 
enough so that any data point with a value of Sj or Sj larger than SO could be 

automatically defined as a boundary point for practically any data set. As the raw data 
are read into the program and the number of occurrences of Sj and Sj combinations are 

accumulated in the joint histogram, a new tape is created for later use. This tape contains 
the integers 1 through SO 2 = 2,500 and instead of writing Sj and Sj on this tape for each 

data point, one number is written on the tape that gives the location of Sj and Sj in the 
joint histogram. The location, I, of Sj and Sj in the joint histogram is computer using 


I = (51) Sj + Sj 


(3) 


and is a unique one-dimensional representation of the two dimensions Sj and Sj. The new 

tape containing an integer I for each data point eliminates the necessity for recalculating 
Sj and Sj for each data point a second time. After all the data elements from the data 

sets have been exhausted and the joint distribution is complete, a decision is made as to 
which combinations of Sj and Sj are to be considered as boundaries. The decision curve 

for a boundary point is based upon the formula 



> (2)BLIM 


(4) 


where Sj' and Sj' are the values of Sj and Sj at the mode of the joint histogram and 

ASCN, ACOL, IPOW, and BLIM are input parameters to the program. The input 
parameters allow for a wide variety of decision curve shapes and positions. Nominally 
IPOW = 2 and BLIM = 1, where as ASCN is usually equal to ACOL and must be 
estimated based upon experience. The possible values of I, or correspondingly Sj and Sj 

are then inserted into equation (4) and the variable N(I) in the computer program is set 
equal to 0 if equation (4) is not satisfied and is set equal to -1 if equation (4) is satisfied. 
The tape containing 1 for each data point is then read into the program, and the value of 
N(I) is written out as another tape that contains only the integers 0 and -1, with -1 
representing a boundary point. The Joint Boundary Program runs slightly longer than the 
Sequential Boundary Progra n, because of the use of an intermediate tape. Both programs 
use alphanumeric characters to convert the integer tapes to paper or microfilm maps. For 
example, the alphanumeric symbol for a zero is usually a blank, while the symbol for a 
-1 is usually a period representing a boundary point. 
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A boundary map from the Sequential Boundary Program of the North Central 
Alabama test site is shown in Figure 6. The map covers 3 ERTS tapes of image 
ERTS-E-1 104-15552, which have been divided into a total of 12 strips, and is 1,300 scans 
by 2,-t30 samples/scan. The brightest area in the map is the Tennessee River, since the 
data representing the water present very little change within the river, except at the river 
edge. The other large areas that appear homogeneous are mostly uniformly forest covered 
areas. 


A color composite of the ERTS imagery corresponding, to the boundary map, is 
shown in Figure 7 for comparison. The color composite was made by illuminating 3 of 
the 4 ERTS image bands with red, green, and blue light, and hand registering the 
resulting images on a color composite camera system. The image was then photographed 
to obtain a negative from which a print could be made. Typically, all of the photographic 
products presented in the computer section show varying amounts of degradation due to 
number of photographic reproductions required for producing the report figures. 
Compare Figures 6 and 7 with Figures 2 and 3 also. 

Examination of the boundary maps provide several types of information. First of 
all, the boundary map provides a simultaneous check on the condition of the data in all 
channels. Bad places in the data tend to show up conspicuously. Secondly, the boundary 
map provides a means of correlating the location (scan and column) of the data on the 
digital tape with particular geographic areas in a photographic image or another map. This 
information can then be used to select appropriately scaled aerial photography for ground 
truth comparison. Different features in the ground scene will appear on the map with 
varying degrees of homogeneity. By comparing the size and homogeneity of the areas 
occupied by particular features with aerial photography, some indication can be obtained 
for determining the types of information that can be retrieved from the data set. Also, 
this type of examination can be useful for manually selecting particular training areas 
when it is desirable to produce a map of only a few specific features for a particular 
application. Finally, the boundary map is examined from a quality point of view and 
compared with imagery to see whether there are enough boundary points to properly 
delineate homogeneous areas in the ground scene or whether the map contains too many 
boundary points. As a general rule of thumb, experience has indicated that approximately 
34 percent of the data points in a data set should be boundary points. Table 1 gives the 
exact percentages of boundary points for each strip. 

The next step in the procedure was to select one of the twelve strips for further 
analysis. This strip was selected on the basis of whether or not it contained a majority of 
the Level I category land use classes, which were urban and built-up land, agricultural 
land, rangeland, forest land, water, non-forested wetland, and barren land. Tape 3, strip 2 
seemed to satisfy most of these requirements, and the boundary map from that strip was 
input into the Spatial Clustering Program. 
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North Alabama study area. 
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TABLE 1. PERCENTAGE OF BOUNDARY POINTS FOR EACH DATA STRIP USING 
SEQUENTIAL BOUNDARY PROGRAM WITH NLEVEL - 4 
























Spatial Clustering 

The Spatial Clustering Program, discussed in this section, and the Spectral Merging 
Classification Programs, discussed in the next two sections, are all part of what is called 
the Composite Classification Program. The three programs are discussed separately 
because their functions are different. The spatial clustering segment of the composite 
program can accept a boundary tape from either the Sequential or Joint Boundary 
Program. In either case a boundary point is mathematically represented by -1 in the 
program and non-boundaries by 0’s. The purpose of the spatial clustering program is to 
locate and label homogeneous areas of a minimum size or larger contained within a 
boundary. Hopefully, this is done in such a way that a labeled area does not contain data 
representing a combination of two spectrally different features. This is accomplished by 
using a fixed shaped p by p data point array which moves through the boundary map only 
in the scan or column direction. Initially, a homogeneous area in the boundary map is 
found which is large enough for the array to fit into. The area covered by the array is 
designated as belonging to cluster 1 , and the array is allowed to move in this area until a 
boundary point is encountered and then the array can no longer move in that direction. 
All data point locations falling within the movement of the array are said to belong to 
cluster 1. As the boundary tape is read into the spatial cluster program, a cluster tape is 
written out which contains 0’s and -l’s as on the boundary tape, but in addition, the 
data locations that were previously 0, but now belong to cluster 1, are written out as l’s 
on the cluster tape. After the array can no longer move and engulf new data point 
locations, another location »s found which wHl contain the p by p array. All the data 
point locations engulfed by the array will be designated as belonging to cluster 2, and the 
corresponding 0’s on the boundary tape will be converted to 2’s on the cluster tape. This 
process is continued until all the boundary map data have been exhausted. Occasionally, 
two differently numbered clusters were run together. When this occurs, and the two 
clusters overlap by 4 or more scans, they are spatially merged and designated as the same 
cluster. After the spatial merging, and in order to keep the bookkeeping straight, the 
clusters are renumbered so that the cluster numbers will always have a continuous range 
from 1,..., N. The output of the spatial clustering program is a cluster tape containing 
the integers -1,0, 1, ..., N, which is essentially the same as the boundary tape except that 
some of the 0’s have been changed to non-zero positive integers. 

The problem of gathering data from two spectrally different features, because of 
the gaps in the boundary between the two features, can be eliminated if the p by p array 
is chosen large enough. Since the array cannot occupy a data point location that is 
designated as a boundary, the array will generally be large enough so that it cannot pass 
through the smaller size gaps in a boundary. Previous experience has indicated that a 10 
by 10 array size is adequate, which insures that the minimum population of a cluster will 
be 100. There does exist, however, a trade-off between the array size and the number 
of distinct spectral features that can be detected. By using a 10 by 10 array size, it is 
possible to obtain data from homogeneous areas of a size equal to or greater than 10 by 
10 only, and it is possible to completely miss spectrally distinct features contained in 
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smaller homogeneous areas. If the array is reduced in size, then a greater risk is presented 
in terms of mixing data from two spectrally distinct features, which may result in an 
inaccurate classification map. The classification map can be completed later by making a 
second pass through the data, as explained in another section. 

A printout of the cluster tape obtained from tape 3, strip 2 using a 10 by 10 
cluster array is shown in Figure 8. In order to show the geographic location of the 
clusters, the print made from the microfilm had to be reduced to a size where it was 
impossible to read the computer symbols on the map. As a result, the cluster numbers have 
been labeled to the side of the map. Enlarged portions of the cluster map will be shown 
in the next section. Examination of the cluster map reveals that it would not be possible 
to completely classify all of the data. For example, there are no clusters in the Tennessee 
River representing water. Table 2 lists the cluster numbers and their populations. 

Spectral Merging 

Until now the majority of the information considered was concerned with the 
spatial or geographic description of the data. The spectral merging program is the first 
segment of the composite classification program that is specifically concerned with 
spectral information and uses the raw data and cluster tapes as inputs. The program uses 
the cluster tape to locate the raw data belonging to each cluster by skipping through the 
cluster tape, and correspondingly through the raw data tape, until the integer 1 occurs. 
Then, for all the data belonging to cluster 1, mean values for each data channel, 
covariance matrix, eigenvalues, and eigenvectors are calculated. When all the data has 
been exhausted for cluster 1, the program skips on the cluster tape, and correspondingly 
on the raw data tape, until a cluster of 2’s are found. Statistics are then calculated from 
the data belonging to cluster 2. 

The program has now reached a stage where it is possible to compare the statistics 
of two clusters and, based upon the decision logic, decide whether or not the two 
clusters contain spectrally similar information, which means that both clusters represent 
spectrally similar ground scene features. Although the mathematics of the decision logic is 
described in detail in Reference 11, it appears helpful to repeat some of it here. 
Basically, it was desirable to surround the data belonging to a cluster with an 
n-dimensional cloud surface, so that the region occupied by the data in the n-dimensional 
vector space would be bounded. Th§ calculation of the mean values, or first order 
moments, for each channel would determine the location of the closed surface in the 
vector space, which meant that the shape of the closed surface had to be described by 
second or higher order moments. In order to minimize the number of moment 
computations, only second order moments, the covariance matrix, were used and the 
resulting closed surface was an n-dimensional hyperellipse. The ellipse equation for each 
cluster was computed in the principal axis coordinate system of the data belonging to 
that cluster using the eigenvalues and eigenvector matrix, which was obtained from the 
covariance matrix. The decision rule for deciding whether or not two clusters are spectrally 
similar is that the data from both clusters are spectrally similar if the mean vectors for 
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each cluster are contained within the 
ellipse of the other cluster. 
Mathematically, the decision is made by 
computing the vector adjoining the center 
of masses of the two clusters 


lk*l ~£*1- k*2 _ £*2> k*3 " £*3’ k*4 
-^4] = [Z j , Z2» ^3,74] , ( 5 ) 


where k and £ stand for cluster k and £, 
respectively, and the bars represent the 
average of the data channels 1, 2, 3, and 
4. This vector is transformed into the 
principal axis system of both clusters via 
the eigenvector matrix to produce the 
vector 


[kZr, k Z 2 ',kZ3'.k Z 4'l W 


in cluster k’s coordinate system, and to 
produce the vector 

in cluster £’s coordinate system. Finally, 
the vectors must satisfy 


TABLE 2. CLUSTER NUMBER AND 
POPULATION FOR 10 by 10 
ARRAY ON TAPE 3, 

STRIP 2 


Cluster 

Population 

1 

187 

2 

340 

3 

153 

4 

110 

5 

310 

6 

130 

7 

243 

8 

180 

9 

242 

10 

232 

11 

100 

12 

150 

13 

110 

14 

202 

15 

294 

16 

100 

17 

140 




< N(SCLMRG) 



< N(SCLMRG) , 


( 8 ) 


where and gOj 2 are the eigenvalues for the eigenvectors in cluster k and £’s principal 

axis coordinate system, respectively, N * 4 is the number of channels and SCLMRG 
(scale merge) is an input parameter into the program. Nominally, the product of N and 




SCLMRG is equal to one. If equation (8) is satisfied, then the data from both clusters are 
combined and a new set of mean vectors, covariance matrix, eigenvalues, and eigenvectors 
are calculated, and cluster 1 and 2 are said to represent a ground scene feature belonging 
to class 1. If equation (8) is not satisfied, then cluster 1 is said to represent class 1 and 
duster 2 is said to represent class 2. Ail of the clusters are examined in the above manner 
and checked to see if they will merge with any of the previous classes. The final result is 
that there will be a finite set of statistics output on a tape, called the statistics tape, 
which represent the classes of distinct features found in the ground scene. The number of 
classes can equal the number of clusters, but usually the number of classes is less than 
the number of clusters due to spectral merging. The statistics that are saved on the 
statistics tape are the mean vectors, the eigenvalues, and the eigenvector rotation matrix 
for each class. 

Table 3 shows the class assignment for each cluster along with the original mean 
vectors of each cluster. In order to check the merging procedure and for assistance in 
interpreting the results, it is helpful to plot a scatter diagram of the mean vector 
components. This is accomplished in Figures 9, 10, and 11 by plotting the mean 
amplitude of the data in channel 1 versus the mean amplitude of the data in channels 2, 
3, and 4. The location of the cluster mean vector components in the scatter diagram are 
labeled with the corresponding class number. Figure 12 is a section of the cluster tape 
printout which shows the geographic location of clusters 4 through 1 1 with the boundary 
points represented by dots and clusters represented by the symbols 4, 5, 6, 7, 8, 9, 0, A, 
respectively. Only 45 symbols are used for representing the clusters and in the event that 
more than 45 clusters are encountered, the symbols a^ recycled, i.e., cluster 46 is 
represented by a 1, cluster 47 by a 2, etc. Figure 13 is an enlargement of a portion of an 
aerial photograph of the same approximate area, which is used for corroborating the 
computer printout with the ground truth. Table 4 is a iisting of the mean vector 
components associated with the classes that were obtained by merging. Since the end 
product that will be used is the classification map, the interpretation of the results is best 
accomplished by comparing both the cluster map and the classification map with the 
available ground truth. This is done in the next section. 

Classification, Using Composite Program 

The inputs to this section of the composite program are the rrw data tape, 
boundary tape, and statistics tape, while the outputs are the classification map and area 
percentages of classification. The number of classes ire limited to a total of 43, because 
of storage requirements and th' fact the map looks quite busy with that many classes. 
This limit is also maintained in the spectral merge program. Once 43 classes are obtained 
in the spectral merge program, the program stops searching for new clusters and the 
classification subroutine is called. The spectral merge program is also limited to a total of 
400 clusters, but that limit has yet to be reached. 
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Figure 1 2. Cluster tape output 
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figure 13. Aerial photograph corresponding to cluster tape output. 
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TABLE 4. CLASS MEAN VECTOR COMPONENTS 


Channel 1 

Channel 2 

Channel 3 

Channel 4 

20.273 

17.872 

21.711 

11.15 

18.317 

13.259 

20.079 

11.634 

20.882 

17.527 

19.955 

10.9 

19.111 

14.931 

22.108 

12.923 

18.181 

11.724 

20.091 

10.984 

19.117 

13.906 

18.75 

10.633 

17.923 

12.274 

21.687 

12.986 


The mean vectors, eigenvalues, and eigenvectors for each class are read and stored 
in the classification program, and the boundary tape is read simultaneously with the raw 
data tape. In order to save time and to preserve the boundaries for additional clustering, 
the boundary points and the raw data corresponding to the location of the boundary 
points are skipped. Thus, classification is attempted only for non-boundary data points, 
which typically represent about 66 percent of the data set. It also appears that the 
majority of misclassifications occur with boundary points, which is probably due to the 
fact that boundaries represent data that are in the process of changing from one feature 
to another. During this change, the data could become very much like the data belonging 
to any feature in the data set. The output classification map with then contain the 
original boundaries plus the classified data points. Physically, the classification tape will 
contain the integers -1, 0, 1, 2, ..., N, where -1 represents a boundary, 0 represents an 
unclassified data point, and 1 through N represent data points belonging to class 1 
through N, respectively. 

The decision rule for classifying a data point is almost identical to that used for 
spectral merging. The vector connecting the vector data point in question and class mean 
vector in question is computed as shown below, 


t x l “k*l’ x 2"k*2> x 3*k*3’ x 4"kMl = [Wj, W 2 , W 3 , W 4 ] 


t 

. i 












where xj, X 2 , X 3 , x^ are the vector components, or channel amplitude of the data point, 

and jjXj, 1 ^, fcXj, ^x^ are the mean vector components of class k. The eigenvector 

matrix then transforms equation (9) into the principal axis coordinate system of class k 
to produce a vector with components [Wj\ W 2 ', W 3 , W 4 '] , and the decision rule for 

classifying a vector data point is based upon the ellipse equation, 


N / w i'\ 2 

Y. [— -) < 2N(SCLCLS) (10) 

i=l W 


The parameter SCLCLS (scale class) is an input to the program and is typically 2.25, N = 
4 is the number of channels, and j^Oj is the eigenvalue for the ith channel of the kth 

class. At first glance, it would appear to be more efficient to compute the ellipse 
equation for each class and not bother with the principal axis transformation, which has 
to be computed for each data point for each class. However, a time saving can be realized 
as the number of data channels increase. For example, if 12 channels of data are used, 
the ellipse equation will have a total of 78 second order terms to sum, some of which are 
positive and some of which are negative. By using the principal axis transformation, it is 
not always necessary to compute all the terms for each data point for each class. In the 
principal axis coordinate system, and for 12 channels, there are only 12 second order 
terms to sum for the ellipse equation and each term is positive. However, in order to 
compute each term in the ellipse equation, it is necessary to use the eigenvector rotation 
matrix and this requires a sum of 12 second order terms for each term in the ellipse 
equation. The time saving can be accomplished as follows: Equation (10) has to be 
computed for all classes everytime an attempt is made to classify a data point. The 
computation is done by transforming one component of the vector and checking the sum 
against 2N(SCLCLS). If equation (10) is satisfied, another component is transformed and 
added to the previous sum, which is again compared with 2N(SCLCLS). If the sum 
exceeds 2N(SCLCLS) for a particular class before all of the components have been 
transformed, then there is no need to check the other components. The logic of the 
program then proceeds to check the other classes for the particular data point in 
question. If equation (10) is satisfied, then the value of 2N(SCLCLS) is replaced by the 
value of the sum. Thus, everytime equation (10) is satisfied, the right hand side of 
equation ( 10 ) gets smaller, and fewer components need to be transformed as more classes 
are checked. This type of logic can be used because all of th*. terms in equation (10) are 
positive. An additional time saving can be realized by utilizing the geographical 
information from previously classified data points and assigning priorities on which classes 
to check first. The priorities for a particular data point arc assigned by checking the 
status of the data point in the previous column of the same scan and the data point in 
the previous scan of the same column. If the previously classified data points belong to 
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the same class, then an attempt is made 
to put the new data point in the same 
class. If equation (10) is satisfied, the new 
data point is put into that class, otherwise 
all of the other classes have to be 
checked. If the previously classified data 
points belong to two different classes, an 
attempt is made to put the new data 
point in one of those two classes. If 
equation (10) is satisfied lor only one of 
the two classes, then the new data point 
is put in that class. If equation (10) is 
satisfied for both of the classes, then the 
new data point is put in the class having 
the smailest sum for equation (10). If 
equation (10) is not satisfied, then the 
other classes have to be checked. In both 
of the above cases, when it is necessary to 
check the rest of the classes and when 
equation (10) is satisfied, the new data 
point is put in the class having the 
smallest sum for equation (10). If 
equation (10) is never satisfied, the new 
data point is left unclassified. If either or 
both of the previous data points are a 
boundary or unclassified, then all of the 
classes have to be checked. 

Figure 14 shows the classifica ion 
map that was obtained for the 7 classes 
using a 10 by 10 cluster array, and Table 
5 shows the percentage of data points 
belonging to each category. The light 
areas on the map indicate the areas that 
are not classified. These areas are mainly 
water in the Tennessee River, urban in the 
city of Huntsville and Redstone Arsenal, 
and other cropland and cattle grazing 
areas. An enlarged portion of the 
classification map corresponding to Figure 
13 is shown in Figure 15. 

It was anticipated that the 
classification would probably be 
incomplete with only one classification 
pass using a 10 by 10 clustering array, so 
the composite program was actually set 
up to do two passes. This is generally done 



Figure 14. Classification map for 7 classes 
using 10 by 10 cluster array. 
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TABLE 5. CLASS POPULATION AND PERCENTAGES 


Category 

Population 

Percentage 

Unclassified 

63,657 

24.76 

Boundaries 

88,957 

34.60 

Class 1 

15,652 

6.09 

Class 2 

8,837 

3.44 

Class 3 

5,243 

2.04 

Class 4 

33,118 

12.88 

Class 5 

4,790 

1.86 

Class 6 

4,618 

1.8 

Class 7 

32,207 

12.53 

Total 

257,079 

100.00 


with all computer runs involving clustering. The second pass uses a 5 by 5 cluster array, 
which gives a minimum cluster population of 25 data points, and instead of using the 
boundary map again, the clustering is performed on the first pass classification map. 


Remember that the cluster array can only be located and moved on data point 
locations that are represented by a zero on the boundary or classification tape. Thus, 
classified data points on the classification map will also act as boundaries to the cluster 
array, and a smaller-sized array can be used on the second clustering pass, because there 
is less chance of mixing data from two spectrally distinct features in the same cluster. 
Figure 16 shows the output of the second pass cluster tape and Table 6 lists the original 
cluster mean vectors, population, and class designation. Since there are already 7 classes, 
the cluster numbering starts with cluster 8. Figures 17, 18, and 19 are the scatter 
diagrams for the mean vector components for the clusters labeled with the corresponding 
class number. Figures 20 and 21 show enlarged portions of the second pass cluster and 
classification map, respectively, which correspond to the aerial photograph in Figure 13, 
and Figure 22 shows the entire classification map. Table 7 lists the single character 
computer symbols used to represent cluster and class number and Table 8 lists the class 
mean vector components. Table 9 lists the area percentages for each category in the 
classification map. 
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Interpretation of Results 

The objectives were to determine 
how well the computer classification 
would lend itself to the land use scheme 
presented in Circular 671. and then to 
produce a land use map of at least Level I 
categories. Thus, the fiiai step consisted 
of examining the available aircraft imagery 
in conjunction with the cluster and 
classification maps, and the scatter 
diagrams of the cluster and class mean 
vector components. Based upon 
geographic location, a limited knowledge 
of the local area, and the color of the 
ground scene depicted in color infrared 
aircraft imagery, attempts were made to 
group the classes into very' loosely defined 
categories of urban, water, cropland, 
pasture, forests, and swamp areas. Figures 
23. 24. and 25 show the result of this 
grouping in terms of the scatter diagram 
of class mean vector components. The 
symbols on the scatter diagram, U. W. C. 
P. F. and S represent the components of 
the class mean vectors belonging to urban, 
water, cropland, pasture, forest, and 
swamp, respectively. Table 10 lists the 
class numbers and the tentative categories 
to which they were assigned. From the 26 
classes that were obtained. 2 classes were 
assigned to cropland. 6 were assigned to 
forest. 6 were assigned to pasture. It) 
were assigned to urban. I was assigned to 
swamp, and 1 was assigned to water. A 
more specific definition of what the 
computer classes represent is attempted in 
the next section where the classification 
maps are discussed with some of the 
classes being represented by the same 
computer symbol and different computer 
symbols are used to provide black and 
white contrast. 



Figure 16. Second pass cluster map using 
5 by 5 cluster array. 


\ 


42 


y 


TABLE 6. MEAN VECTOR COMPONENTS AND POPULATION FOR SECOND 
PASS CLUSTERS USING A 5 by 5 ARRAY 


Cluster 

Class 

Channel 1 

Channel 2 

Channel 3 

Channel 4 

Population 

8 

8 

23.0 

14.732 

44.39 

26.585 

41 

9 

9 

24.68 

19.12 

26.36 

14.12 

25 

10 

10 

27.36 

22.76 

32.24 

16.92 

25 

11 

11 

28.12 

23.72 

33.8 

18.16 

25 

12 

10 

27.971 

23.771 

32.114 

16.657 

35 

13 

12 

26.52 

22.28 

30.28 

15.84 

25 

14 

12 

26.075 

20.75 

31.6 

16.975 

40 

15 

9 

25.267 

18.667 

26.267 

13.967 

30 

16 

12 

25.0 

20.04 

29.8 

16.44 

25 

17 

13 

26.333 

21.2 

27.0 

13.967 

30 

18 

12 

25.7 

20.733 

29.667 

15.967 

30 

19 

14 

25.36 

21.0 

28.16 

14.68 

25 

20 

12 

26.733 

21.567 

30.033 

15.667 

30 

21 

12 

26.141 

20.962 

28.833 

15.167 

78 

22 

12 

26.8 

| 22.28 

30.88 

16.16 

25 

23 

15 

25.457 

19.971 

27.457 

14.914 

35 

24 

12 

27.16 

22.08 

30.52 

15.92 

25 

25 

12 

26.457 

21.543 

30.343 

16.229 

35 

26 

12 

26.338 

21.703 

30.959 

16.703 

74 

27 

12 

26.418 

21.525 

30.09 

15.975 

122 

28 

10 

27.174 

22.391 

31.304 

16.478 

25 

29 

12 

26.083 

21.333 

30.292 

16.25 

25 

30 

16 

26.767 

22.933 

34.533 

18.8 

30 

31 

10 

27.429 

22.971 

32.629 

17.029 

35 

32 

17 

24.229 

16.657 

43.314 

25.914 

35 

33 

17 

23.723 

15.773 

43.378 

25.924 

119 

34 

18 

25.04 

21.44 

32.88 

18.56 

25 

35 

17 

24.119 

16.149 

43.634 

26.168 

101 

36 

19 

26.5 

21.467 

31.867 

17.667 

30 

37 

20 

18.64 

12.6 

12.92 

5.4 

25 

38 

21 

20.346 

12.346 

6.096 

0.904 

52 

39 

22 

19.0 

17.72 

28.32 

16.4 

25 

40 

22 

19.4 

17.433 

28.3 

17.767 

30 

41 

23 

22.233 

16.8 

32.667 

19.6 

30 

42 

23 

23.533 

18.867 

30.567 

16.967 

30 

43 

24 

23.32 

20.84 

28.16 

15.48 

25 

44 

23 

22.4 

18.7 

30.925 

17.9 

40 

45 

23 

22.729 

18.729 

30.854 

17.625 

48 

46 

25 

21.76 

13.96 

43 36 

27.12 

25 

47 

23 

22.48 

18.04 

31.24 

18.2 

25 

48 

26 

22.8 

17.1 

36.2 

21.667 

30 
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Figure 18. Channel I and 3 5 by 5 cluster mean vector components. 







REPRODUCIBILITY OF THE ORIGINAL PAGE IS POOR. 



















A study of the color infrared 
aircraft imagery presented several 
observations pertinent to the land use 
categories. Apparently, the month of 
November is not a good month for 
obtaining different types of crop 
classification, since most of the known 
crop areas appeared green or blue green in 
color. This indicates a lack of chlorophyll, 
which in turn indicates that the crops had 
probably been harvested or at least had 
reached the end of their growing cycle. 
This would also account for the fact that 
there were only two classes associated 
with the locations of the crop areas, and 
these classes were observed along the 
entire length of the classification map. 
These crop classes also tended to show up 
in backyards and other urban areas. The 
areas which appeared to have healthy 
growing vegetation, indicated as various 
shades of pink or red in color imagery, 
correlated very highly with known pasture 
or cattle grazing areas. These areas also 
appeared in backyards, where winter grass 
was apparently sown, or on well kept golf 
courses. Some known pasture areas are 
covered with trees and these areas would 
be classified as forest rather than pasture. 
It was hoped that in the forest categories, 
it might be possible to differentiate 
between softwood and hardwood trees. 
However, the classes obtained for trees 
appear to be more related to the density 
of the tree growth. It might be possible to 
distinguish hardwood ind softwood and 
easier if a different season of data were 
available and the 'raining areas were 
manually selected. 

The classes for urban indicated 
that it might be possible, at least in the 
city of Huntsville, to distinguish in a broad 
sense residential from other types of 

22. Second pass classification 
map of tape 3, strip 2. 


49 

















TABLE 8. SECOND PASS CLASS MEAN VECTORS 


Class 

Channel 1 

Channel 2 

Channel 3 

Channel 4 

8 

23.0 

14.732 

44.39 

26.585 

9 

25.0 

18.873 

26.309 

14.036 

10 

27.525 

23.051 

32.136 

16.788 

11 

28.12 

23.72 

33.8 

18.16 

12 

26.296 

24*386 

30.141 

16.066 

13 

26.333 

21.2 

27.0 

13.967 

14 

25.26 

21.0 

28.16 

14.68 

15 

25.457 

19.971 

27.457 

14.914 

16 

26.767 

22.933 

34.533 

18.8 

17 

23.949 

16.043 

43.471 

26.02 

18 

25.04 

21.44 

32.88 

18.56 

19 

26.5 

21.467 

31.867 

17.667 

20 

18.64 

12.6 

12.92 

5.4 

21 

20.346 

12.346 

6.096 

0.904 

22 

19.218 

17.564 

28.309 

17.6 

23 

22.671 

18.312 

31.191 

18.0 

24 

23.32 

20.84 

28.16 

15.48 

25 

21.76 

12.96 

43.36 

27.12 

26 

22.8 

17.1 

36.2 

21.667 




















Figure 24. Channel 1 and 3 class mean vector components. 







TABLE 10. CLASSIFICATION CATEGORIES 


Class 

Category 

Class 

Category 

1 

cropland 

14 

urban 

2 

forest 

15 

urban 

3 

cropland 

16 

urban 

4 

forest 

17 

pasture 

5 

forest 

18 

urban 

6 

forest 

19 

urban 

7 

forest 

20 

swamp 

8 

pasture 

21 

water 

9 

urban 

22 

forest 

10 

urban 

23 

pasture 

11 

urban 

24 

pasture 

12 

urban 

25 

pasture 

13 

urban 

26 

pasture 


urban area, if the training areas were manually selected. The residential areas tended to 
have more trees and vegetated lawns, which decreased the brightness on the color aircraft 
imagery. 

The class for water was obtained from a deep portion of the Tennessee River and 
was quite different in appearance from water occuring in shallow areas. Thus, it was not 
possible to classify water appearing in small lakes or ponds (areas smaller than a 5 by 5 
cluster size) or in small narrow tributaries and rivers branching off from the Tennessee 
River. The class for swamp was an area that contained trees which were standing in 
water. 

CLASSIFICATION, USING STAND ALONE PROGRAM 

Rather than compiling the large composite program for classifying each strip, the 
classification program was copied and written so that it could be run independently of 
the composite program. The stand along classification program uses the raw data, 
statistics, and boundary tapes as inputs, but has the additional option of including or 
deleting boundary points in the classification analysis. The boundary points were included 
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in the analysis to see how the area class percentages would change and to get an idea of 
how much error was involved in classifying boundary points, r.'ble 11 shows the 
comparison in terms of class area percentages when the boundaries are and are not 
included in the classification analysis, and Table 12 shows the increase in area percentage 
for the six land use categories. 


TABLE 11. CHANGE IN CLASS PERCENTAGES WHEN BOUNDARY POINTS 

ARE CLASSIFIED 


Category or 
Class 

Boundaries Not Classified 

Boundaries Classified 

Percentage 

Change 

Population 

Percentage 

Population 

Percentage 

Unclassified 

15 733 

6.12 

25 497 

9.9 

3.78 

Boundaries 

88 957 

34.6 

0 

0 

-34.6 

1 

15 652 

6.09 

22045 

8.56 

2.47 

2 

8 837 

3.44 

9 362 

3.64 

0.2 

3 

5 243 

2.04 

6 306 

2.45 

0.41 

4 

33 118 

12.88 

45 039 

17.49 

4.61 

5 

4 790 

1.86 

5 923 

23 

0.44 

6 

4618 

1.8 

5444 

2.11 

0.31 

7 

32 207 

12.53 

41660 

16.18 

3.65 

8 

138 

0.05 

237 

0.09 

0.04 

9 

3 063 

1.19 

5 971 

2.32 

1.13 

10 

1619 

0.63 

2 728 

1.06 

0.43 

11 

176 

0.07 

238 

0.09 

0.02 

12 

2 798 

1.09 

3 933 

1.53 

0.44 

13 

835 

0.32 

1 655 

0.64 

0.032 

14 

2466 

0.96 

4186 

1.63 

0.67 

15 

386 

0.15 

472 

0.18 

0.03 

16 

1449 

0.56 

3 078 

1.2 

0.64 

17 

715 

0.28 

1 297 

0.5 

0.22 

18 

384 

0.15 

333 

0.13 

-0.02 

19 

565 

0.22 

512 

0.2 

-0.02 

20 

175 

0.07 

432 

0.17 

0.1 

21 

772 

0.3 

918 

0.36 

0.06 

22 

2342 

0.91 

4001 

1.55 

0.64 

23 

15 522 

6.04 

37 191 

14.44 

8.4 

24 

9132 

3.55 

17 810 

6 92 

3.37 

25 

480 

0.19 

1 139 

0.44 

0.25 

26 

49i8 

1.91 

10133 

3.93 

2.02 


57 













TABLE 12. AREA PERCENTAGE INCREASE OF LAND USE CATEGORIES 
WHEN BOUNDARIES ARE CLASSIFIED 


Land Use Category 

Area Percentage Increase 

cropland 

2.88 

forest 

9.85 

pasture 

14.3 

urban 

3.64 

water 

0.06 

swamp 

0.1 

Total 

30.83 


Examination of Table 1 1 indicates that all classes had an increase in population 
except for two urban classes. This is probably due to the spatial priority assignment used 
in the classification program which produces an effect that is particularly noticeable 
when two classes are competing for the same area that should possibly have been merged. 
The logic used in the classification procedure starts on the left side of the map and 
moves to the right side. After a scan has been completed, the classification starts again on 
the left side of the map on the next scan. With two classes that are close together 
spectrally, the logic produces a map that has a somewhat streaked appearance in the 
northwest, southeast direction. This effect generally does not hurt the accuracy as long as 
a category or definition is used which includes the two classes, since the two classes can 
be written on the map with the same computer symbol. 

While obtaining the classification results, the original raw data tape 1 went bad, 
and therefore it is not possible to show classification results for the entire test area. The 
problem of tapes going bad was anticipated, but the preventative action taken was not 
sufficient. Initially, a reformatted tape and a copy were made from the raw data tape 
originally obtained from Goddard Space Flight Center. On occasion, however, the 
reformatted tape and its copy would be bad at the same time, and it was necessary to go 
back to the original raw data tape once too often. Therefore, it is highly recommended 
that a copy or two be made from the original tape on initial receipt of the data. 

Figure 26 shows the classification map for the eastern two thirds of the test area 
using 26 classes. The class number, computer symbol and category are listed in Table 13. 
Although it is not possible to read the symbols on the map, it is possible to distinguish 
shades of grey. The darker areas on the map represent a combination of urban, pasture, 
and cropland, except along the Tennessee River where the dark area represents water. 
The lighter shades of grey represent forest Examination of the map reveals that classes 
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TABLE 13. 26 CLASS MAP COMPUTER SYMBOLS AND CATEGORIES 


Class 

Number 

Computer 

Symbol 

Category 

Class 

Number 

Computer 

Symbol 

Category 

0 

blank 

unclassified 

14 

u 

urban 

1 

+ 

cropland 

15 

u 

urban 

2 

/ 

forest 

16 

u 

urban 

3 

+ 

cropland 

17 

X 

pasture 

4 

/ 

forest 

18 

u 

urban 

5 


f orest 

19 

u 

urban 

6 

/ 

forest 

20 

* 

swamp 

7 

. 

forest 

21 

w 

water 

8 

X 

pasture 

22 

- 

forest 

9 

u 

urban 

23 

X 

pasture 

10 

u 

urban 

24 

X 

pasture 

11 

u 

urban 

25 

X 

pasture 

12 

u 

urban 

26 

X 

pasture 

13 

u 

urban 





representing different types of water are missing, since the shallower parts of the 
Tennessee River and tributaries are unclassified. Thus, it will be necessary to do more 
clustering and to combine some of the class symbols to produce a classification map with 
more contrast. Compare Figure 26 with Figures 2, 3, and 7. 

The definitions of the class categories are re-examined in terms of the map in 
Figure 26 in an attempt to be more specific. The definition of cropland in terms of 
classes 1 and 3 appears to be non-forested but vegetated areas, and the vegetation appears 
to have completed its growing cycle. In order for classes 1 and 3 to represent cropland 
only, it would probably be necessary to obtain seasonal ERTS data, where a plowing and 
growing cycle could be observed. Certainly, classes 1 and 3 represent areas that can be 
easily used as cropland. 

The category pasture, represented by classes 8, 17, 23, 24, 25, and 26, appears to 
be non-forested but vegetated areas, and the vegetation is still in the growing cycle. It is 
possible that some confusion may exist on the map between ciopland and pasture, but 
the confusion is somewhat lessened by the high degree of correlation between known 
pasture areas shown on the map and the probability that the growing season for most 
crops has been completed. Again, seasonal data is needed to reduce the uncertainty, but 
no amount of data would help in picking out pasture areas covered with trees. Both of 
the categories, cropland and pasture, do represent areas that could easily be used for 
agricultural purposes. 




The classes representing forest (2, 4, 5, 6, 7, and 22) as a whole appear to be very 
acceptable. As subcategories of forest, the classes do appear to correlate highly with tree 
density, but the accuracy of some of the classes need to be improved as indicated by the 
streaked forested areas on Figure 26. 

The urban classes (9, 10, 11, 12, 13, 14, 15, 16, 18, and 19) appear to represent 
various types of residential areas depending on yard space, number of trees, and density 
of houses. The unclassified part of the City of Huntsville correlates very highly with the 
business district and large shopping centers along the major throughfares. The business 
structures at NASA on Redstone Arsenal and the Huntsville-Decatur Jetport are also 
unclassified urban areas. 

The swamp area (class 20) is forested wetland, and water (class 21) appears to 
represent w; ter contained in the deep channel of the Tennessee River, which has a 
different turbidity from the rest of the water in the river. The categories of cropland, 
pasture, forest, urban, swamp, and water will continue to be used, but with the above 
mentioned qualifications. Misclassification occurring in some of the boundary areas is 
apparent, but it does not seem to present a serious problem. 

Tables 14 and 15 list the population and percentages for each class contained on 
the two data tapes that were divided into four strips each, while Table 16 lists the 
population and percentages for each land use category as a function of tape and strip. 
Table 1 7 lists the land use category population and area percentages for all of Figure 26. 

In order to obtain additional classes for water, tape 2 strip 4 was input to the 
clustering program and the location of the newly acquired clusters is shown in Figure 27. 
Table 18 lists the mean vectors for the new clusters, assigned class number and the 
population for ich cluster, while Table 19 lists the mean vectors for the new classes and 
land use categories. 

FINAL RESULTS 

There ■’re four basic types of output that can be used for examining the 
classification maps and these are computer printout, Xerox microfilm copy, the microfilm 
negative itself, and photographs made from the microfilm negatives. Thus, the maps can 
be examined at a wide variety of scales. However, no attempt has been made to produce 
an output that is geographically correct. The main reason for not attempting to scale the 
data for geographic accuracy is that the scale would probably have to be changed, in two 
directions, every time data from a different set of scanner data is used. In one direction 
the scale would depend on the data sampling rate (samples/scan/channel), and in a 
direction perpendicular to that, the scale would depend upon the scanning rate. The 
second reason is that the computer characters used on the map are generally longer than 
they are wide and there is no readily available way to easily ^ry their dimensions. The 
data associated with one ERTS imag' intains 3,240 samples/scan and 2,340 scans, so that 



TABLE 14. CLASS POPULATION AND AREA PERCENTAGES FOR TAPE 2 


Tape 2 


Strip 3 


Strip 4 


32 884 

12.77 

30620 

11.89 

3? 

12.82 

32 674 

12.56 

11 799 

4.58 

11 835 

4.6 

15 498 

6.02 

20671 

7.95 

8 872 

3.44 

5 127 

1.99 

5 239 

2.03 

4 854 

1.87 

7 628 

2.96 

6 732 

2.61 

7 137 

2.77 

14 671 

5.64 

54 756 

21.26 

54 120 

21.01 

44 544 

173 

42 532 

16.35 

3 718 

1.44 

3 499 

1.36 

4 771 

1.85 

4 583 

1.76 

6 303 

2.45 

6 450 

2.5 

5 387 

2.09 

4 940 

1.9 

48 474 

18.82 

51 798 

20.11 

43 145 

16.75 

32 459 

12.48 

161 

0.06 

153 

0.06 

175 

0.07 

295 

0.11 

6 408 

2.49 

6 516 

2.53 

7 613 

2.96 

7 928 

3.05 

248 

0.1 

217 

0.08 

850 

033 

810 

031 

15 

0.01 

18 

0.01 

94 

0.04 

72 

0.03 

823 

0.32 

874 

0.34 

1 999 

0.78 

2 141 

0.82 

717 

0.28 

606 

0.24 

1 292 

0.5 

1 665 

0.64 

3 022 

1.17 

3 258 

1.26 

4 283 

1.66 

4 505 

1.73 

442 

0.17 

455 

0.17 

646 

0.25 

653 

0.25 

1 207 

0.47 

1 156 

0.45 

1 989 

0.77 

2 107 

0.81 

919 

0.36 

945 

0.37 

1 393 

0.54 

1 726 

0.66 

489 

0.19 

389 

0.15 

440 

0.17 

442 

0.17 

313 

0.12 

277 

0.11 

437 

0.17 

448 

0.19 

178 

0.07 

319 

0.12 

156 

0.06 

284 

0.11 

1656 

0.64 

334 

0.13 

1 513 

0.59 

1 204 

0.46 

3 511 

1.36 

3 220 

1.25 

1 970 

0.76 

1442 

0.55 

34 019 

13.21 

38 708 

15.03 

40 936 

15.89 

40 616 

15.62 

19 882 

7.72 

19 865 

7.71 

19 773 

7.68 

21 614 

8.31 

642 

0.25 

816 

0.32 

1 143 

0.44 

1 380 

0.53 

8 464 

3.29 

9 253 

3.59 

12 116 

4.7 

13 354 

5.13 















TABLE 15. CLASS POPULATION AND AREA PERCENTAGES FOR TAPE 3 


| Tape 3 

Strip 1 

Strip 2 

Strip 3 

Strip 4 

Population 

Percent 

Population 

Percent 

Population 

Percent 

Population 

Percent 

23 487 

9.12 

25 507 

9.9 

23 059 

8.95 

32 247 

12.01 

47420 

10.65 

22 045 

8.56 

18 674 

7.25 

10 863 

4.18 

5 662 

2.2 

9 362 

3.64 

18618 

7.23 

23 410 

9.00 

10 023 

3.89 

6 306 

2.45 

5 826 

2.26 

4 078 

1.57 

45 020 

17.48 

45 039 

17.49 

48 054 

18.66 

49 825 

18.16 

5 015 

1.95 

5 923 

2.3 

4 129 

1.6 

5 146 

1.98 

3 803 

1.48 

5444 

2.11 

4 887 

1.9 

6 780 

2.61 

41 880 

16.26 

41 660 

16.18 

45 438 

17.64 

58 114 

22.34 

178 

0.07 

237 

0.04 

344 

0.13 

211 

0.08 

5 164 

2.01 

5 971 

2.32 

3 890 

1.51 

3 204 

1.23 

572 

0.22 

2 728 

1.06 

905 

0.35 

269 

0.1 

39 

0.02 

238 

0.09 

61 

0.02 

17 

0.01 

1226 

0.48 

3 933 

1.53 

1 043 

0.4 

586 

0.23 

1062 

0.41 

1 655 

0.64 

697 

0.27 

776 

0.3 

3 591 

1.39 

4 186 

1.63 

2 601 

1.01 

2 364 

0.91 

349 

0.14 

472 

0.18 

248 

0.10 

174 

0.07 

1986 

0.77 

3 078 

1.2 

2 076 

0.81 

760 

029 

1052 

0.41 

1 297 

0.5 

1 625 

0.63 

867 

0.33 

353 

0.14 

333 

0.13 

306 

0.12 

149 

0.06 

300 

0.12 

512 

0.2 

225 

0.09 

118 

0.05 

530 

0.21 

432 

0.17 

166 

0.06 

165 

0.06 

1 129 

0.44 

918 

036 

777 

0.3 

884 

0.34 

3 205 

1.24 

4 001 

1.55 

5 187 

2.01 

7 968 

3.06 

40 755 

15.82 

37 191 

14.44 

37 593 

14.6 

28 845 

11.09 

21 539 

8.36 

16 810 

6.92 

16 379 

6.36 

13 053 

5.02 

923 

0.36 

1 139 

0.44 

2 053 

0.8 

2 225 

0.86 

11 287 

4.38 

10 133 

3.93 

12 689 

4.93 

7 997 

3.07 
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TABLE 16. LAND USE CATEGORY POPULATION AND AREA PERCENTAGES 

FOR TAPES 2 AND 3 



| Tape 2 j 


Strip 1 

Strip 2 

Strip 3 

Strip 4 

Category 

Population 

Percent 

Population 

Percent 

Population 

Percent 

Population 

Percent 

unclas- 

sified 

32 884 

12.77 

30 620 

11.89 

33 012 

12.82 

32 674 

12.56 

cropland 

19 427 

7.54 

18 567 

7.21 

22 635 

8.79 

35 342 

13.59 

forest 

125 634 

48.77 

124 214 

48.22 

105 056 

40.78 

90810 

34.91 

pasture 

64 087 

24.89 

69 740 

27.08 

75 536 

2932 

78 985 

30.36 

urban 

13 684 

5.32 

13 756 

5.34 

19642 

8.63 

20 811 

9.0 

swamp 

178 

0.07 

319 

0.12 

156 

0.06 

274 

0.11 

water 

1656 

0.64 

334 

0.13 

1 513 

0.59 

1 204 

0.46 


Tape 3 

unclas- 

sified 

23 487 

9.12 

25 507 

9.9 

23 059 

8.95 

31 247 

12.01 

cropland 

37 443 

14.54 

28 351 

11.01 

24 500 

• 

14 491 

5.75 

forest 

103 865 

40.61 

111429 

43.27 

126 313 

49.04 

151 243 

58.15 

pasture 

75 734 

29.4 

67 807 

26.32 

70683 

27.45 

53 198 

20.45 

urban 

14 642 

5.7 

23 106 

8.98 

12 052 

5.68 

8 422 

3.25 

swamp 

530 

0.21 

432 

0.17 

166 

0.06 

165 

0.06 

water 

1 129 

0.44 

918 

0.36 

777 

0.3 

884 

0.34 





TABLE 17. TOTAL POPULATION AND 
AREA PERCENTAGES FOR LAND 
USE CATEGORIES ON 
TAPES 2 AND 3 


the data set is approx. mutely 1.38 times 
wider than it is long. even though the 
sides ot' the image are of equal length. 
The computer characters tend to 
compensate for this effect, since they are 
longer than they are wide, and a map 
results that is not too badly distorted for 
ERTS data. 


In order to get an idea of the i A 

scales involved, the different types of sW 

computer output were compared with the * 

1:24.000 scale TV A maps. The east-west 
scale of the computer printout is 0.84 
times that of the TVA maps, while the 
north-south scale of the printout is 1.56 

times that of the TVA maps. Thus, the ' V ^ 

classification map on the computer 

printout will be 1.86 times longer than it OF . ^£<9 

should be. The Xerox copy is 0.5 times as ^ 

wide and 0.6 1 times as long as the TVA * 

maps, or 1.22 times wider than it should ^ • **.■£} 

be. The microfilm is approximately 0.055 

times as long and 0.044 times as wide as 1 1,1 1 

the TVA maps, while the photography 

obtained from the microfilm negatives can Figure 27. Cluster map tor tape 


Category 

Population 

unclassified 

232 490 

cropland 

200 756 

forest 

938 564 

pasture 

555 770 

urban 

126 II 5 

swamp 

2 220 

wa ter 

9 415 








TABLE 18. CLUSTER STATISTICS FOR TAPE 2, STRIP 4 


Cluster 


27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 


Population 


25 

99 


35 

47 

1756 

734 


TABLE 19. CLASS STATISTICS FOR TAPE 2, STRIP 4 


Channel 1 

Channel 2 

Channel 3 

Channel 4 

27.873 

26.629 

27.113 

12.71 

28.771 

28.886 

31.2 

15.457 

26.638 

21.468 

41.362 

23.255 

25.488 

18.555 

9.813 

1.862 

25.229 

22.052 

12.959 

2.467 

25.862 

23.717 

17.105 

4.467 

32.067 

28.7 

29.533 

13.333 

28.2 

24.657 

23.086 

9.829 

25.54 

19.81 

12.121 

2.534 

27.776 

24.605 

18.273 

4.782 

25.375 

20.875 

13.958 

4.0 

31.36 

28.48 

27.16 

11.52 





























be made to vary in scale. Table 20 gives the approximate scales for ERTS data and the 
standard map outputs. The Xerox map for the area analyzed on the computer is 
approximately 7 feet (2.1 meters) by 8 feet (2.5 meters), while the printout map is 
approximately 18 feet (5.5 meters) by 13.3 feet (4.1 meters). In area size, the computer 
test area was approximately 60 miles (86 kilometers) by 66.5 miles (106.4 kilometers), or 
3,990 square miles (10,214 square kilometers). 


TABLE 20. APPROXIMATE SCALES FOR CLASSIFICATION MAPS 


Output 

Approximate Scale 

North-South Direction 

East-West Direction 

Printout 

1:15,385 

1:28,571 

Xerox Copy 

1 : 39, 344 

1 :48,000 

Microfilm 

1:435,538 

1:536,360 


Figure 28 is a four category map, with water and swamp represented by the 
symbol 0, cropland and pasture represented by X, urban by ., forest by/, and blank 
represents unclassified data. The categories were combined down to four in order to 
produce some contrast. 

In order to see more detail, the next three maps have at most only three 
categories. Figure 29 shows the categories water, swamp and urban represented by W, ., /, 
respectively, while Figure 30 shows water (no swamp), pasture, and cropland represented 
by /, X, and ., respectively. Figure 31 shows water (no swamp) and forest represented by 
W and /. Table 21 shows the population and area percentages for each land use category 
for the entire data set. Compare Figures 28 through 31 with Figures 2, 3, 7, and 26. 

In order to have a comparison of time estimates in making the land use map, it 
will again be mentioned that the programs were run on an IBM-7094/44, but that no 
recommendation is being made for using any particular computer system. The computer 
time includes compilation and printing time. The boundary program runs approximately 
224 samples/second and took 4 hours to produce the map in Figure 6. It would not be 
generally necessary to get boundary maps for the entire test area, but the maps are 
presently being examined for other purposes. The composite program, run on tape 3 strip 
2 producing Figures 8, 14, 16, and 22, ran for 57 minutes. The length of time that the 
composite program runs will depend considerably on the number of clusters and classes, 
which in this case were 48 and 26, respectively. 








tit-** 


showing water, swamp, and urban categories 


















TABLE 21. LAND USE CATEGORY POPULATION AND AREA PERCENTAGES 


Category 

Population 

Percentage 

unclassified 

154 140 

7.47 

cropland 

193 832 

9.39 

forest 

938 885 

45.51 

pasture 

551 287 

26.72 

urban 

180 599 

8.75 

swamp 

2 160 

0.1 

water 

42 547 

2.06 


The map shown in Figure 26 was produced by using the stand alone classification 
program and took 3 hours 16 minutes and 40 seconds. The running time of this program 
also depends on the number of classes, but for 26 classes it ran approximately 175 
samples/second. The additional clustering performed on tape 2, strip 4 shown in Figure 
27 took 34 minutes and 43 seconds. The classification program used to produce the 4 
category land use map shown in Figure 28 took 3 hours 49 minutes and 4 seconds. For 
38 classes the program runs approximately 150 samples/second. Normally, Figure 28 
would be the end product and the analysis would stop there, but, to provide visibility to 
the accuracy of the results. Figures 29. 30, and 31 were generated. These were generated 
with a program which does nothing but assigns computer symbols to the classes 
contained on the classification tape. It took 1 hour 55 minutes and 12 seconds to 
produce each figure, and the program ran at approximately 448 samples/second. The 
total amount of computer used to produce the results shown in Figures 6. 8, 14, 16, 22, 
26, 27, 28, 29, 30, and 31 came to 19 hours 23 minutes and 3 seconds. 


Composite Sequential Clustering Program (CSCP) 

BRIEF HISTORY OF THE PROGRAM 

The composite sequential clustering algorithm (CSCP) is a combination of two 
algorithms that have been in use for some time. These are a sequential clustering 
algorithm and a K-means algorithm. The contribution cited in Reference 4 was the 
interfacing of the two algorithms in one computer program. 
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Initially the decision was made to develop a classification program that was 
entirely sequential, i.e., only one pass through the data was required. Certain drawbacks 
exist in allowing only one pass, however. In a sequential algorithm classes are initiated, 
data points are added, and statistics are continuously updated. The net result is that a 
sample classified into a given class near the beginning of the classification map might not 
be classified into the same class if it occurred near the end of the classification map. This 
can cause a considerable reduction in classification accuracy. With classification accuracy 
as a desired feature, a literature survey was undertaken. The K-means algorithm in its 
original form used an initial guess of class mean vectors and then iterated to improve 
classes. The better the initial guesses were, the faster the convergence to final 
classifications was. The need for initial classes prompted the use of a sequential algorithm 
to generate initial classes. The CSCP was the result. The sequential program generated 
these initial classes, and the IC-means algorithm iterated to improve classification 
accuracy. 

DESCRIPTION 

The CSCP consists of the two main sections, mentioned above, the sequential 
portion, and the K-means portion. The sequential clustering section begins by establishing 
an initial class. As written, the program begins by reading and storing in the XDATA 
array, the first six points in the sample sequence. The mean vector for these six points is 
calculated and then the parameter AXj* is calculated for each sample, where 

K 

AXj i 2 = V (Xj, k-Xj.k) 2 
k=l 


i refers to the sample number, 1 to the 1st class, and k to the channel. The maximum 

value to AX, 2 for the six samples is found, and if 


^* 2 max 
£], kF 


« THRESH 


(ID 


then the six samples will i>e designated as a class. THRESH is an input parameter set to 
0.75 for all cases reported in this report. If the test is not met then the first sample in 
the sequence is discarded. A new sample is read in and the test is performed again with 
the revised six samples. This process is continued until equation (11) is satisfied, which 
means that a class is formed. Once the first class is formed, each following sample point 
is checked using two tests, a Chi-square and normal test, to see if it belongs to the 
established population. Ihese tests are discussed at length in Reference 4 and will not be 
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repeated here. If the sample does belong to the established pouplation, the class statistics 
will be updated. If the sample does not satisfy the two tests it is placed into the XDATA 
array. If the number of points ir. the XDATA array is six, these six points will be 
subjected to the test of equation ( 1 1 ) to see if they constitute a new class. If not, the first 
sample put in the XDATA is discarded and the process continues. Eventually, a number 
of classes will be established and each sample will be either used to update the statistics 
of the class to which it belongs or the sample will be put into the XDATA array, and 
possibly be used to establish a new class. 

If the number of classes exceed the input value of MAXP0P, subroutine REDP0P 
is called to merge the two most “similar” classes. The (symmetric) distance matrix 
composed of the Euclidean distances between each of the class means is calculated. The 
two classes, whose Euclidean distance between centroids is the lallest of all the 
elements of the distance matrix are the two classes that are merged. The mergiwg 
procedure is continued as necessary until the end of the raw data tape is encountered. 
When this occurs the raw data tape is rewound and the class mean vectors are used as 
inputs to the K-means portion of the program. 

The K-means portion of the program takes the cluster centers, either input to the 
program on computer cards or calculated by the sequential part of the program, and 
iteiates through the data NITER times (NITER was set U> 3 for this report) to improve 
classification. How the program can be used on means as input to the program on cards 
will be discussed in a later section. 

The iteration takes the fri! .. ring form. The Euclidean distance squared from the 
sample to each of the class means is calculated and the sample is classified into the 
classes for which the distance squares is the least. Statistics for this class arc updated to 
include the new sample. Once the end of the raw data tape is reached, control is shifted 
to subroutine CMAP which prints out a classification map and the iteration statistics. If the 
maximum number of iterations (NITER s 3) has not been exceeded, then the new class 
means are used to begin the K-means classification again. Note that in beginning the 
K-means iteration the means from the previous iteration are used to classify, and 
completely new statistics are being calculated. The means from the previous iteration are 
not continuously updated, but are completely recalculated. Once the end of the sample 
sequence is reached and the number of iterations equals NITER, the final classification 
map and statistics are printed out. One last option was generally utilized in the 
preparation of this import. If the input parameter ND0UBL is set tj I, a double track 
boundary map is printed out. This map prints out only the border of areas of 
homogeneous classification, and leaves the interior of the area blank. In many cases this 
allows easy location of features that would be hard to find otherwise. 
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PROGRAM CHANGES 

Most of the material in the previous section is contained in Reference 9. 
Discussion of many of the options available in the program was not mentioned for the 
simple reason that they were not used for the preparation of this report. Since the 
publication of Reference 9, a few changes in the program have been made which will 
now be described. One major hange concerns the Chi-square tables used in the test in 
the sequential part of the program. The original data used tables for a 99 percent level of 
confidence. For all data analyzed up to that time, the 99 percent tables had been 
adequate. When the analysis of ERTS data of the Huntsville area was undertaken, only 
three or four classes were established, which was highly inadequate. The need for lower 
confidence limit tables was not anticipated and the 99 percent tables were being input at 
compile time with a DATA statement. So that the program would not need changing 
each time new tables were used, the program was revised to read in the tables with a 
NAMELIST statement. In addition, new Chi-square tables for a level of confidence of 80 
percent replaced the old tables. Most statistics books have Chi-square tables for from 1 to 
30 degrees of freedom (n) and perhaps for other selected values of n (e.g., n = 50, 100, 
00 ...). The program requires values of n from 1 to 201. The fact that the Chi-square 
distribution approaches the normal distribution for large values of n is well known. Some 
functions of n and Ch.-s,.re approach the normal distribution much faster than 
Chi-square itself so that a^roximate values of Chi-square for any value of n can be 
calculated. One particularly good approximation for Chi-square described in Referen *** 
12 is 



( 12 ) 


where Z is the normal deviate for given level of confidence. For example, for the 80 
percent level of confidence X 2 n .qQQ and X J n . jqq, Z 9 qq is approximately -1.28 and 

Z ,oo is +1.28. To generate the 80 percent tables, these values of Z were used in the 
approximation of equation (12). 

ANALYSIS PROCEDURES 

The general problem addressed in automated land use classification with 
unsupervised classification programs is to generate classes that can be easily interpreted. 
The CSCP can be used in either an unsupervised or in a supervised mode. In the 
unsup^rvised mode no human data interpretation is required prior to a run. In the 
supervised mode, subjective human judgements are required before the data is analyzed. 
When using the CSCP on satellite data, such as the ERTS data, where lighting conditions 
are relatively uniform across the entire image, the following procedure is generally 
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followed. The program is initially clustered on a small area using the unsupervised mode. 
The clustering consists of establishing classes with the sequential section and then 
iterating to improve the classes with the K -means algorithm. The second step requires 
subjective human judgements concerning the interpreting of the classes. The aim in the 
preparation of this paper was to be able to interpret classes corresponding to Level 1 
categories in the classification scheme outlined in the USGS circular 671. In the 
data analyzed, the initial step resulted in IS classes. The next step was to compare the 
classification map with existing aerial photography, which in this case was the 
high-altitude RB-57 data used in making the photomosaic. From this analysis, two classes 
were found to be combinations of two or more useful classes and could not readily be 
given a Level 1 name. The decision was made to completely eliminate these two classes 
from further consideration. Discarding the 2 classes left 13 classes to be used in the next 
phase of the analysis. With only these mean vectors as input to the CSC’P and MAXP0P 
set to 13, it is now possible to completely bypass the sequential part of the CSCP. To do 
this, MBYPASS is set to one (previously zero), and the mean vectors are input as the last 
data cards in the deck (read in using a 4F6.0 format). 

To summarize, the program is initially clustered on a small area and the resulting 
classes are interpreted. Next, the analysis is extended by using the interpreted results 
from the small area to classify a much larger area. Since the sequential algorithm is 
bypassed, relatively much less time is required in analyzing the new data than was 
required in clustering. Relative times are discussed in the Critique Section. 

FINAL RESULTS 

Figure 32 is the section of the photomosaic which was analyzed by the CSCP. A 
photograph of the computer map of the complete area analyzed is depicted in Figure 33, 
which is approximately 30 miles wide and 26.5 miles long. The 13 classes have been 
given five Level I names. These are (in order of increasing brightness in Figure 33): (1) 
agricultural, pasture; (2) forest; (3) agricultural, cropland; (4) water; and (5) urban. 
Figures 34 through 37 show (1) pasture and water, (2) forest and water, (3) cropland and 
water, and (4) urban and water. For each of these figures all the symbols but the two of 
interest were blanked out. 

The data analyzed was from E-ERTS-1 104-15552 tape 3. The scans were 256-763 
ai.u samples (across) were 1-810. Hence, the total area analyzed included 810 x 508 = 
411,480 pixels (picture elements), each approximately 57.1 meters wide by 79.1 meters 
long (in the scan direction). Each symbol, in Figures 33 through 37, occupies u square 
area so the scale is larger along the scan. This area is approximately 1,800 square 
kilometers. 

Comparison of Figures 32 and 33 indicates that overall the classification was 
fairly accurate. Some problem areas do exist, but the relative size of these areas is small 
compared to the area classified. One such problem area is the urban classification of 
residential areas with a high density of trees. These areas may be classified as agriculture 
(pasture or cropland). Other problems associated with the classes will be discussed in 
more detail where the selected areas are presented. 
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Figure 32. Aerial photograph of test area. 
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Figure 33. Computer map of test area. 





Computer map of pasture and water categories 










35. Computer map of forest and water categories. 
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Computer map of urban and water categories. 





The inpiu class mean vectors for the final classification are shown in Table 22. 
The statistics for the classes for the final classification are shown in Table 23. The mean 
vectors of Table 22 are the mean vectors from the initial clustering run. The statistics 
depicted in Table 23 are the output of the classification iteration. 

Detailed discussion of the entire 411,480 pixel classification will not be 
undertaken. Instead, two specific areas of the map will be considered. They are the area 
including and surrounding Marshall Space Flight Center and Redstone Arsenal, and the 
Jones Valley area of Huntsville. These areas were selected mainly on the basis of 
familiarity and the easy availability of ground truth. 

Tire MSFC area will be considered first. Figure 38 is a blow up from one of the 
RB-57 prints showing this area. The color print corresponding to Figure 38 is shown in 
Figure 13. Figure 39 is the computer map with only the five Level I symbols used. 
Figure 40 is the double-track boundary map of the same area with different symbols for 
each of the forest, urban, and cropland categories. Recall that the blank areas are 
members of the class whose symbol surrounds them. Within the limits of resolution of 
the data, the classification is fairly good. The only real problem is the separation of the 
forest and the cropland categories. The cropland category is actually cleared land that 
does not have lush, low vegetation growing on it. This is not the Level 1 description of 
the class, but is actually the category classified by the CSCP. Frequently, this is cropland 
(e.g. cotton), but at times it may be land that was formerly used for crops but was 
recently abandoned. 

The last case is transitional between cropland and forest land. As the land is 
abandoned probably the first trees to come in (in the Huntsville area) are red cedar 
(Juniperus Virginiana) or possibly loblolly pine (Pinus taeda). The initial growth will be 
sparse and might appear to a remote sensor aboard ERTS to be either forest or crops. 
Another type of area similar to the above case is grassland planted with tree seedlings. 
The Army in many areas on Redstone Arsenal has planted pine seedlings. The trees are 
planted at low density for rapid growth. Even visual discrimination between these areas 
and some cropland on the RB-57 data is sometimes difficult. 

From Figure 40 the forest categories R and Z appear the most dense growth. 
Class T is less dense and class F is the least dense. Lighting conditions on the slopes of 
the mountains also have a considerable effect. Dense forest on a sunny southeastern slope 
may appear similar to a less densely forested area located upon level topography. 

The area just north of the 4200 building complex (MSFC area) is a mowed grass 
area with a north-south running line of trees (Fig. 38). Since from the RB-57 data this 
area looks similar to crop land the area should have been classified as crop. Instead, it 
was classified as the least dense, F forest category. The reason F is named forest rather 
than crop is that the class is found frequently in dense forest areas on sunny mountain 
slopes. 
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TABLE 22. INPUT MEAN VECTORS FOR THE K-MEANS SECTION OF THE CSCP 


Class No. 

Channel 1 

Channel 2 

Channel 3 

Channel 4 

1 

20.804 

12.958 

7.166 

1.560 

2 

22.645 

16.662 

35.261 

20.7o 1 

3 

20.279 

15.884 

26.801 

15.770 

4 

23.055 

15.818 

41.382 

24.664 

5 

24.683 

21.429 

28.969 

15.322 

6 

20.234 

15.957 

17.980 

9.159 

7 

23.438 

21.395 

24.490 

12.448 

8 

30.222 

27.440 

32.715 

16.151 

9 

18.327 

13.574 

*1.872 

12.994 

10 

17.204 

11.829 

16.103 

8.887 

11 

24.284 

20.742 

16.581 

5.722 

12 

22.200 

13.702 

48.042 

29.727 

13 

18.111 

13.015 

19.256 

11.186 


TABLE 23. CLASSIFICATION STATISTICS 


Iteration No. 1 
Improved Cluster Centers 


Jetplex TC Paint Rock Pasture 

Class 

Sym. 

Nr. Sample 

Spectral Mean Values 

1 

W 

4989 

20.610 

12.820 

7.182 

1.580 

2 

/ 

27438 

22.687 

17.090 

34.176 

20.001 

3 


73098 

20.460 

16.045 

26.561 

15.460 

4 

/ 

8085 

23.066 

15.934 

41.042 

24.328 

5 


35093 

24.479 

21.164 

29.609 

15.812 

6 


27224 

20.378 

16.814 

19.339 

10.084 

7 


46850 

22.475 

20.383 

24.094 

12.449 

8 


8919 

30.166 

27.576 

32.653 

16.083 

9 


103418 

18.873 

14.254 

22.403 

13.079 

10 


23617 

17.294 

11.848 

15 844 

8.652 

11 

W 

1287 

24.261 

20.545 

16.129 

5.169 

12 

/ 

2341 

22.196 

13.730 

47.841 

29.501 

13 


50741 

18.147 

13.043 

19.306 

11.177 

Class 

Sym. 

Nr. Sample 

Spectral Standard Deviations 

1 

W 

4989 

1.435 

1.395 

2.134 

1.293 

2 

1 

27438 

1.613 

2.055 

2.088 

1.505 

3 


73098 

1.346 

t.447 

1.995 

1.490 

4 

1 

8085 

1.671 

2.062 

1.875 

1.549 

5 


35093 

1.852 

1.902 

1.841 

1.461 

6 


27224 

1.080 

1.563 

1.564 

1.184 

7 


46850 

1.660 

1.825 

1.612 

1.196 

8 


8919 

3.985 

4.492 

2.467 

2.234 

9 


103418 

1.137 

l.< 10 

1.223 

1.069 

10 


23617 

1.093 

1.151 

1.655 

1.166 

11 

W 

1287 

2.145 

2.465 

2.817 

2.2a. 

12 

/ 

2341 

1.395 

1.255 

2.859 

2.351 

13 


50741 

0.981 

1.095 

1.031 

0.847 


84 
























reproducibility of the original page ispoor. 


omccocc* ■«* 

WARD MOUNTAIN 


wajm/ wuiu/un -2»^-cc<«r c 

** ■“—**- *TSjc c < y , CN W|-vc I u •■ 1 w n . n w yij , ^x<fit ec 

GOVERNOR'S DRIVE 




WM%»WNMgwi 




|UL d 

l«WX cc-* 


-»»*'»•'’ -V ^V— T lV « 

r* >-•*%* ■« . . <v 


C 3SE 


i' » » r» i » 


jt CUSSES 

• • * 

'". URBAN U,V FOREST F, R. T, 8 WATER • 

AGRICULTURE (PASTURE) X AGRICULTURE (CkOPS) A, B 


Figure 40. Double-track boundary map of MSFC area. 
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If the 78 distances between the 13 classes are ordered, the 2 distances between 
the I ; category and the 2 crop categories rank fifth and sixth. These classes are close 
enough so that the condition of illumination holds the key as to which classification of a 
sample will occur. 

The Jones Valley area is a large cattle grazing area planted entirely with grass 
(Kentucky 31 Fescue). By far the most striking features in the IR channels of the fall 
ERTS imagery are areas of cattle grazing. These areas possess an extremely high IR 
reflectance, which enables the CSCP to make a ready classification. 

Figure 41 is a blown up section of one of the RB-57 images showing the Jones 
Valley area. Figure 42 is the classification map with the five Level I symbols. Figure 43 is 
the double-track boundary map with different symbols for the urban, forest, and crop 
categories. Compare Figures 42 and 43 with Figures 4 and 5 also. 

For a sample of how good the classification can be, note the four urban symbols 
in the center of the large cattle grazing area. These cover the area of the farmhouse and 
other farm buildings. Approximately 1 1 scans above and 4 samples to the left of this area 
is a single forest classification. This is a small group of pecan trees whose area is not 
quite the size of a resolution element. The areas classified urban on the northern and 
western fringes of the pasture are accurately classified residential sections. The only 
misclassifications of any note are located on the small mountain west of Jones Valley. 
The classification is B crops where it should be urban. This residential zone has many 
trees and this misclassification is understandable. Notice that the area surrounding these B 
classified areas are the expected less dense F and T forest categories. 

CRITIQUE 

To compare the time required for the unsupervised mode with that required for 
the supervised mode, an area three-fourths the size of the total 411,480 pixel area was 
classified. The IBM 7094 time required was approximately 3 hours and 50 minutes with 
three iterations. In practice, an area one wrth this size is probably all that is necessary 
for clustering. The total IBM 7094 time for the iteration or classification map was 1.5 
hours. 


Due to limited manpower no effort has been made to streamline the CSCP. 
Certainly some of the subroutines in the program are not as efficiently programmed as is 
possible, and, with some rearrangements, there is no doubt that considerable time and 
storage could be saved. 

Classification of the data with the CSCP is generally very good, but could be 
unproved. In Reference 9, the K -means classification is not based entirely on a 
Euclidean minimum distance, but is actually the Euclidean distance normalized by a 
“characterized” variance. This characterized variance is simply the magnitude of the class 
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Figure 41. Photograph of Jones Valley area. 


variance vector. The reason each distance is not normalized using the vector variance is 
that the time required, with the present efficiency of the CSCP, would be prohibitive. 
The characterized variance is a good normalization factor if all components of the class 
variance vector are approximately the same. With the Huntsville ERTS data, experience 
has shown that most of the classes satisfy this condition readily. With the characterized 
variance the classification accuracy of the CSCP would probably be improved 
considerably. 
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Figure 42. Computer map of Jones Valley area. 
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Distinctions and Similarities of the Composite Sequential 
Clustering Program (CSCP) and the Spatial and Spectral 
Clustering Program (SSCP) 

Although the same end products are desired, the initial efforts on both programs 
are different. In order to establish cluster populations, the CSCP uses spatial information 
only in one direction, by considering consecutive samples within a scan. Actually the 
spatial information is secondary, since statistical tests, based upon spectral information, 
are used to decide which samples belong to a cluster. As a consequence, no spatial 
information is used in relating samples in different scans, but only spectral information. 
Since data samples for each cluster are acquired from all over the data set, it is often 
difficult to relate the make up of a cluster to the ground truth, and because the cluster 
populations are continuously recalculated while reading through the raw data tape, there 
is no guarantee that a sample originally included in the cluster at the beginning of the 
tape would still belong to that cluster when the end of the raw data tape is reached. In 
other words, the clusters and their statistics are highly dependent on where the analysis is 
initialized in the data set. In order to overcome this problem, an iteration procedure is 
used to recalculate the cluster statistics until the clusters and their statistics stabilize. In 
producing a classification map, every data sample is classified as belonging to one of the 
classes. 


In producing a boundary map, the SSCP converts spectral information to spatial 
information in two directions. The locations of the clusters are then defined on the 
boundary map, which provides the visibility necessary in relating the clusters to the 
ground truth. The formation of the clusters is not dependent on where the analysis is 
initiated, and, therefore, no iteration is used. Also, not all of the data samples are 
classified in producing a classification map. The samples not belonging to any of the 
classes, based upon the input parameters for the decision rule, and left unclassified. 

Both programs use a merging procedure which determines the number of classes. 
The CSCP has an automatic spectral merging procedure for keeping the number of classes 
equal to or less than 30 and a manual merging procedure for combining classes. The 
SSCP has an automatic spectral merging procedure, but it is not based upon the number 
of classes. The manual merging procedure is done by assigning two or more classes the 
same computer printout symbol. 

There is always a tradeoff between the degree of complexity and flexibility, and 
how automatic a program can be made. In this respect, the CSCP is more automatic and 
the SSCP is more flexible and complicated to run. Using both programs to analyze a data 
set plus reference to some visual photographic coverage of the area does, however, 
provide more visibility to what is going on in the data. For example, compare the class 
mean vectors in Tables 4, 8, 19, with Table 22, and the classification maps shown in 
Figures 28, 29, 30, and 31 with the maps shown in Figures 33, 34, 35, 36, and 37. 
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Mr. Charles Dalton, of the Aerospace Environment Division, has critically 
reviewed both programs and made recommendations for their improvement. His ideas 
have been published in References 13 and 14, which provide a more detailed basis for 
comparison. 

It is difficult to accurately compare the running times of both programs, since a 
different number of classes were obtained and they were run on different size areas. 
Nonetheless, a rough comparison can be obtained based upon the following numbers. The 
CSCP was originally clustered on a 4 1 1 ,480 sample size data set and obtained 1 S classes 
in 3 hours SO minutes with 3 iterations. The SSCP was run on a 257,550 sample size 
data set and obtained 26 classes in 1 hour 17 minutes, which includes producing the 
boundary map. In order to compare the classification time only, the CSCP was run with 
only 1 iteration of 411,480 sample data set and ran 1 hour and 30 minutes for 13 
classes. The S.SCP ran for 24 minutes and 35 seconds on the 257,550 sample data set 
with 26 classes and for 28 minutes and 38 seconds using 38 classes. 

General Comments and Recommendations 
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At this time it might be worthwhile to review what the report on the computer 
analysis did and did not cover. The report is a demonstration of what can be done with 
the data, but the application part is somewhat lacking. This was mainly because there was 
no readily available user to work with, but an attempt was made by the authors based on 
their experiences with users to anticipate what a potential general land survey user might 
like to have in terms of results. It is hoped that this report will provide a common basis 
for discussions concerning the specific needs of potential users. In this respect, the 
utilization of the computer analysis was directed more toward user agency, rather than 
the man on the street, with the hope that the man on the street would ultimately derive 
the benefits from the interpretations of such an analysis. It is envisioned that the results 
of a similar type analysis could eventually be used as valuable inputs to an information 
management system as a basis for making future decisions on land use and environmental 
impacts. 


As a result of the experience obtained in performing the computer analysis, 
several recommendations can be made for immediate specific efforts. First there is the 
need to be able to produce a geographically accurate map from the classification results, 
and second, an adequate display facility is needed. Thirdly, to improve the accuracy of 
the analysis, seasonal data needs to be incorporated in the analysis. For example, if ERTS 
data were obtained for four seasons and classification maps were made for each season, it 
would be possible to identify those areas that were plowed and label them specifically as 
cropland. In addition, if the planting and harvesting dates were known for the various 
crop types, the information could be used in conjunction with the classification map to 
distinguish between different crop types. Another alternative would be to register and 
combine the 4 ERTS tapes into one date tape containing 16 channels, and analyze all 16 
channels simultaneously. The feature signatures would then contain information as a 
function of time as well as spectral information, and provide a means of detecting 
changes that had occurred in the ground scene. 
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Extending the analysis over 4 tapes to classify an entire ERTS image does not 
appear to be a problem, but it is not known what problems would be encountered in 
extending the analysis from image to image to cover an entire state. Assuming that no 
problem exists in using classes obtained from one image to classify another image, that 4 
channels of data are used, that 13 classes are desired and use, and that the IBM-7094 is 
used, a land use survey for the State of Alabama using the CSCP could be completed in a 
minimum amount of time estimated to be approximately 100 computer hours. At a 
current rate of $60/hour, the survey would cost a minimum of $6000, or roughly 13# a 
square mile. Using SSCP under the same assumption, except that 38 classes are desired 
and used, the survey could take a minimum amount of time estimated to be 
approximately S2 computer hours. Thus, the survey could cost a minimum of $3120 or 
roughly 6# a square mile. The most critical part of the analysis is establishing the 
signatures of the desired features, which could considerably alter the computer time 
either way depending on how many classes are used. 

Before a land use survey of this magnitude is attempted, it is recommended that 
considerable attention be given to the type of features that are desired and can be 
identified in the data, and. Finally, both programs need to be re-examined to minimize 
their running times. 


George C. Marshall Space Flight Center 

National Aeronautics and Space Administration 

Marshall Space Flight Center, Alabama, April 1974 
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APPENDIX 


GROUND TRUTH IMAGERY AND SUPPLEMENTARY LARGE SCALE 
PHOTOMOSAICS OF THE LAND USE STUDY AREA 


While conducting this land use study, considerable ground truth imagery was 
obtained or collected by the participants. Some of the resulting aerial photomosaics and 
individual black and white ground-obtained photographs, as well as photo indexes of 
ground-obtained color imagery, are included. 

Basically this imagery was used in various ways to help interpret actual land use 
or conditions from the aerial photography and the automated classification scheme 
printouts. By including small scale photographs of this ground truth information in the 
report, it is hoped that interested persons desiring to use the imagery will contact one of 
the authors for access to it. 

The following items are included in this appendix: 

1 . Figure 44. Aerial photomosaic of Elk River, S.W. portion. 

2. Figure 45. Ground truth imagery of S.W. portion of the Elk River. 

3. Figure 46. Aerial photomosaic of Elk River, N.E. portion. 

4. Figure 47. Ground truth imagery of N.E. portion of Elk River. 

5. Figure 48. Aerial photomosaic of Jones Valley and East Huntsville. Ala. 

6. Figure 49. Photo index for ground truth imagery covering Jones Valley 

rangeland, dated Nov. 10, 1973. 

7. Figure 50. Photo index for ground truth imagery covering Jones Valley 

rangeland, dated Nov. 17, 1973. 

8. Figure 51. Photo index for ground truth imagery covering Jones Valley 

rangeland, dated Nov. 24, 1973. 
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photomosaic of Klk River, S.W. portion. 






Ground truth imagery for S.W. portion of Elk River 
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photomosaic of Jones Valley and East Huntsville. Alabama. 
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NOTES: 1. IMAGERY RECORDED FROM 

POINTS A. B. C, D, AND E. 


2. VECTOR NUMBERS ARE SLID!: NUMBERS. 

3. ARROWS SHOW PRINCIPAL POINT LOCATIONS. 
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Figure 49. Photo index for ground truth imagery covering Jones Valley 
rangeland, dated Nov. 10, 1973. 
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Figure 50. Photo index for ground truth imagery covering Jones Valley 
rangeland, dated Nov. 17, 1973. 
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Figure 51. Photo index for ground truth imagery covering Jones Valley 
rangeland, dated Nov. 24, 1973. 
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